<Home<Products<Modules<DataScope Clusterer Module
DataScope Clusterer Module

Segmentation and clustering

In several cases, you want to segment your customers, business partners, employees, etc. by several attributes you have in your database about them. Using this information, you can better communicate, work together or address their common needs. In this case, a special set of data mining algorithms will be useful for you: clustering algorithms. They can be used to create similar segments of a large database using several attributes (database fields).

When learning, clustering models create segments in the n-dimension space (where n is the number of examined attributes). Later, these segments are used to classify other records in the database. Naturally, the "goodness" of a classification depends on the shapes of the segments: the more isolated they are, the better the clustering is. This information might be important when using the result of a clustering.

Although the algorithms are quite complex, they are managed through a user-friendly interface, the Model Wizard. By using it, you can easily get through the different tasks of creating the clustering model. If you are familiar with the selected algorithm, you can also fine-tune its specific parameters by displaying their Properties dialog.

The clustering model is an integrated component of your project. You can use all DataScope columns for building (imported ones as well as those created by the Expression Builder, type conversions and so on). The result columns can be reused for building other models or expressions or displayed on DataScope charts.

Changes in model input data affect the model. You can specify whether you want the algorithm to "re-learn" the model if its input data changes; the columns calculated by the model are always refreshed when their input columns change. You can also re-train and re-test your model manually at any time. For example, if you want to analyze the change in customer behavior, create a clustering model on a longer period of time, then reimport the current month without re-learning the clustering. In this way, you can see who have moved to new clusters.

The models can be exported in the standard PMML format for external evaluation tools or to be used in DataScope Model Executor.

The clustering algorithms included:

Kohonen
Shepherd
Fuzzy C-Means

Clustering Module Features:

Wizard guided model training
Data filters with custom expressions for training and test data set definition.
Random sampling of data sets.
Auto validation of training and test data sets.
Learning from data sets containing not known (n/a) values.