Chapter 6 - Decision Trees
Material
I have appended a zip file of the files we will discuss at the seminar. It includes code examples (to visualize decision trees) as well as a pdf discussing topics and summarizing the chapter. The zip-file can be found here. Download The zip-file can be found here.
Summary
Decision trees are a class of machine learning algorithms that are especially powerful in predicting, especially in connection to random trees (see Chapter 7 - Ensemble Learning and Random Forests).
Decision trees are Binary unbalanced trees where each decision is characterized by a feature and a threshold .
Impurity measurements
The cost function of a decision tree can be described by
where is the impurity metric for the left and the right path in the tree, is the number of samples on either the left or the right side of the subtree, and m is the total number of samples in the subtree.
For classification: can either be the Gini impurity metric, , or the Shannon Entropy, .
For regression: is the Mean Squared Error, .
Gini Impurity
The Gini impurity is defined as
where is the ratio of class k samples among all the training samples in the i:th node.
Shannon Entropy
The Shannon entropy is defined as
.
Mean Squared Error (MSE)
The mean squared error is given by
Regularization
- max_depth: Forces an upper size of the decision tree.
- min_samples_split: Minimum number of samples a node has to have to be able to split.
- min_samples_leaf: Minimum number of samples necessary to be a leaf node.
- min_weight_fraction_leaf: Same as min_samples_leaf but expressed as a fraction of the total number of weighted samples
- max_leaf_nodes: Maximum number of leaf nodes allowed.
- max_features: Maximum number of features evaluated for splitting a node.
Session agenda
- Summary of the chapter. Slides will be provided shortly.
- Discussion regarding when Decision trees could be a powerful tool.
- Discussion regarding the problem of the Decision tree being sensitive to data perturbations
- Exercise discussions (Focus: 2, 3, 4. Extra: 7+8)
Extra
Try to reflect over the heuristic method that we use to optimize the Decision Tree problem. Are there similar methods that are proven optimal?