Decision Trees

Decision Trees

Decision Trees allow you to develop classification systems that predict or classify future observations based on a set of decision rules. If you have data divided into classes that interest you (for example, high- versus low-risk loans, subscribers versus nonsubscribers, voters versus nonvoters, or types of bacteria), you can use your data to build rules that you can use to classify old or new cases with maximum accuracy. For example, you might build a tree that classifies credit risk or purchase intent based on age and other factors.

This approach, sometimes known as rule induction, has several advantages. First, the reasoning process behind the model is clearly evident when browsing the tree. This is in contrast to other "black box" modeling techniques in which the internal logic can be difficult to work out.

Second, the process will automatically include in its rule only the attributes that really matter in making a decision. Attributes that do not contribute to the accuracy of the tree are ignored. This can yield very useful information about the data and can be used to reduce the data to relevant fields only before training another learning technique, such as a neural net.

Generated decision trees can be converted into a collection of if-then rules (a ruleset), which in many cases show the information in a more comprehensible form. The decision-tree presentation is useful when you want to see how attributes in the data can split, or partition, the population into subsets relevant to the problem. The ruleset presentation is useful if you want to see how particular groups of items relate to a specific conclusion. For example, the following rule gives us a profile for a group of cars that is worth buying:

IF mot = 'yes'AND mileage = 'low'THEN -> 'BUY'.