|
Using Decision Trees to predict customer behaviour
In
the latest instalment of his series on Customer Relationship Management, Khalid
Sheikh explains how CRM can be used to predict customer behaviour, a vital need
in many businesses
A Decision Tree is a predictive model that is used to make
predictions through a classification process. The predictive model is represented
as an upside down Treeroot at the top (or on the left-hand side) and leaves
at the bottom (or on the right-hand side).
Decision Trees represent rules. By following the Tree, you
can decipher the rules and understand why a record is classified in a certain
way. These rules can then be used to retrieve records falling into a certain
category, and the known behaviour of the category is the predicted behaviour
of the entity represented by the record.
In CRM, Decision Trees can be used to classify existing customer
records into customer segments that behave in a particular manner. The process
starts with data related to customers whose behaviour is already known; for
example, customers who have responded to a promotional campaign and those who
have not; or customers who have churned (left the service for a competitor)
and those who have not. The Decision Tree developed from this data gives us
the splitting attributes and criterion that divide customers into two categories.
Once the rules that determine the classes to which different customers belong
are known, they can be used to classify existing customers and predict behaviour
in future. For example, a customer whose record shows attributes similar to
those customers who have churned in the recent past is more likely to churn,
and that is the prediction that marketers are looking for to plan activities
to pre-empt the churn.
Classification classes
A set of classification classes can be defined for a database
having a large number of records such that each record belongs to one of the
given classes. The classification process decides the class to which a given
record belongs. The classification process in Decision Trees is also concerned
with generating a description or a (predictive) model for each class from the
given data set.
Predictive modelling
Predictive modelling is similar to the human learning experience
in using observations to form a model of the important characteristics of some
phenomenon. This approach uses generalisations of the real world
and the ability to fit new data into a general framework. Predictive modelling
can be used to analyse an existing database to determine some essential characteristics
(model) about the data set. The model is developed using a supervised learning
approach.
This has two phases: training and testing. Training builds
a model using a large sample of historical data called a training set, while
testing involves trying out the model on new, previously unseen data called
a test set, to determine its accuracy and physical performance characteristics.
Applications of predictive modelling include customer retention management,
credit approval, cross-selling, and direct marketing. Supervised classification
is one of the techniques associated with predictive modelling.
Supervised classification
In supervised classification,
- A training data set is used to generate the class
descriptions (predictive models). For each record of the training set, the
respective class to which it belongs is also known. Using the training set,
the classification process attempts to generate the descriptions of classes
(predictive models). These descriptions are then used to classify the unclassified
records.
- A test data set is used to measure the effectiveness of
a classification. A test data set can be used to determine the effectiveness
of a classification method. A set of test records whose classifications are
already known are passed through the classifier and the resulting classifications
are compared with the known classifications. The percentage of matching classifications
is the measure of effectiveness of the classification method.
There are several approaches to supervised classifications.
Decision Trees are especially attractive in the data-mining environment as they
represent rules. Rules can be easily expressed in natural languages, and they
can be easily mapped to a database access language like SQL. To summarise:
- A Decision Tree represents a series of questions.
Good questions produce a short series of questions.
- Each question determines what follow-up question is best
to be asked next.
- The leaves represent the most specific classification
for a data record. Decision Trees are drawn with the root at the top (or on
the left-hand side) and the leaves at the bottom (or on the right). The root
represents the most general classificationthe entire dataset, the leaves
represent the most specific classification. A data record enters the Decision
Tree at the root node (the top) and then the record works its way down until
it reaches a leaf node. The leaf node determines the most specific classification
of the record.
- Effectiveness can be enhanced by pruning the incompetent
branches. Some paths are better than others are because the rules associated
with them are better. The predictive effectiveness of the whole Tree can be
enhanced by pruning incompetent branches.
Building the Decision Tree Algorithm
- The algorithm attempts to find the test that will
split records in the best possible manner among the wanted classification.
- At each lower level node from the root, whatever rule
works best to split the subset is applied.
- The process of finding each additional level of the Tree
continues. The Tree is allowed to grow until you cannot find better ways to
split the input records.
Process of creating Decision Trees
All Decision Tree construction methods are based on the principle
of recursively partitioning the data set till homogeneity is achieved. The construction
of a Decision Tree involves the following phases:
- Construction phase:
The initial Decision Tree is constructed in this phase, based on the entire
training data set. It requires recursively partitioning the training set into
two, or more sub-partitions using a splitting criterion, until a stopping
criterion is met.
- Pruning phase: The pruning phase involves removing some
of the lower branches and nodes to improve performance. The Tree constructed
in the previous phase may not result in the best possible set of rules due
to overfitting. Often the training dataset used for constructing a Decision
Tree may not be a proper representative of the real-life situation and may
contain noise. While building a Decision Tree from a noisy training data set,
it might be prudent to grow the Decision Tree just deeply enough to guard
against the possibility of incorporating unnecessary features making the Tree
difficult to comprehend. A Decision Tree T is said to overfit the training
data if there exists some other Decision Tree T, which is a simplification
of T, such that T has smaller error over the training set but T has
smaller error over the entire distribution of instances. This situation is
indicative of noise in the training set.
- Processing the pruned Tree: In
this step, the Decision Tree is processed to improve understandability.
Classification process
- A record enters the Decision Tree at the root node. At
the root, a test is applied to determine which child node the record will
encounter the next.
- Splitting attribute: Associated with every node of the
Decision Tree is an attribute, called the splitting attribute, whose values
determine the partitioning of the data set when the node is expanded. In the
example described next, outlook, humidity, and windy are the splitting attributes.
- Splitting criterion: The qualifying condition on splitting
attribute for is called the splitting criterion. For a numeric attribute,
the criterion can be an equation or an inequality. For a categorical attribute,
it is a membership condition on a subset of values. In the example, Humidity
< 75%, or > 75% is the criteria for the humidity attribute; whereas
the outlook being sunny, overcast, or rainy are the criteria for the outlook
splitting attribute at the root.
- This process is repeated until the record arrives at a
leaf node. All the records that end up at a given leaf of the Tree are classified
in the same way. There is a unique path from the root to each leaf. The path
is a rule, which is used to classify the records.
Example: The example has been adapted from the book, Data
Mining Techniques by Arun K Pujari; published in the year 2001 by Universities
Press, Hyderabad. Based on training data set shown in Figure 1, the task of
the supervised classification process is to find a set of rules to know what
values of outlook, temperature, humidity, and wind, determine whether a golf
player would choose to play golf. The training data, which contains the attributes
values of golf players who decided to play and who decided not to play, is used
to formulate the rules in Table 1. The rules are tested by making a prediction
about the behaviour depicted in the test data set and then comparing the predicted
behaviour with the actual behaviour that is already known. A match between the
predicted and actual behaviour, shown by ( ), confirms that the rule is correct.
While a mismatch between the predicted and actual behaviour, shown by ( ), indicates
that the rule is incorrect.
The accuracy of the classifier is determined by the percentage
of the test data set that is correctly classified. The last column of the second
table in Figure 1 shows what is the known classification of the records in the
test set; this classification is assumed as the correct classification. The
column also shows whether the classification determined by the Decision Tree
matches with the known classification. A check mark indicates that the classification
determined by the Tree is the same as shown in the test data. A cross indicates
the determined classification is opposite of the one shown in the column. It
above shows the accuracy of the rules based on this. Once the fairly accurate
rules are known the Decision Tree can be built as shown in Figure 2. The Tree
is then used to find the class to which the data element belongs. The behaviour
of the class is the predicted behaviour of the golf player under the situation
described by the data element.
 |
| Rule# |
Rule Description |
|
|
Accuracy |
| |
If..., |
and if..., |
then... |
|
| 1 |
If it is Sunny, |
and the humidity is 75% or less, |
play |
50% |
| 2 |
If it is Sunny, |
and the humidity is abve 75% |
do not play |
50% |
| 3 |
If it is overcost, |
- |
play |
66.67% |
| 4 |
If it is rainy, |
and not windy |
play |
50 |
| 5 |
If it is rainy |
and windy |
do not play |
0% |
The author is associate professor of Supply Chain Management
at S P Jain Institute of Management & Research, Mumbai. He can be contacted
at khalid_sheikh@hotmail.com
|