Chapter 6 - Decision Trees
Material
I have appended a zip file of the files we will discuss at the seminar. It includes code examples (to visualize decision trees) as well as a pdf discussing topics and summarizing the chapter. The zip-file can be found here. Download The zip-file can be found here.
Summary
Decision trees are a class of machine learning algorithms that are especially powerful in predicting, especially in connection to random trees (see Chapter 7 - Ensemble Learning and Random Forests).
Decision trees are Binary unbalanced trees where each decision is characterized by a feature k and a threshold
tk.
Impurity measurements
The cost function of a decision tree can be described by
J(k,tk)=mleftmIleft+mrightmIright
where Inodeis the impurity metric for the left and the right path in the tree,
mnodeis the number of samples on either the left or the right side of the subtree, and m is the total number of samples in the subtree.
For classification: Inodecan either be the Gini impurity metric,
Gi, or the Shannon Entropy,
Hi.
For regression: Inode is the Mean Squared Error,
MSEi.
Gini Impurity
The Gini impurity is defined as
Gi=1−∑nk=1p2i,k
where pi,kis the ratio of class k samples among all the training samples in the i:th node.
Shannon Entropy
The Shannon entropy is defined as
Hi=−∑nk=1,pi,k≠0pi,klog2(pi,k).
Mean Squared Error (MSE)
The mean squared error is given by
MSEnode={MSEnode=∑i∈node(ˆynode−y(i))2ˆynode=1mnode∑i∈nodey(i)
Regularization
- max_depth: Forces an upper size of the decision tree.
- min_samples_split: Minimum number of samples a node has to have to be able to split.
- min_samples_leaf: Minimum number of samples necessary to be a leaf node.
- min_weight_fraction_leaf: Same as min_samples_leaf but expressed as a fraction of the total number of weighted samples
- max_leaf_nodes: Maximum number of leaf nodes allowed.
- max_features: Maximum number of features evaluated for splitting a node.
Session agenda
- Summary of the chapter. Slides will be provided shortly.
- Discussion regarding when Decision trees could be a powerful tool.
- Discussion regarding the problem of the Decision tree being sensitive to data perturbations
- Exercise discussions (Focus: 2, 3, 4. Extra: 7+8)
Extra
Try to reflect over the heuristic method that we use to optimize the Decision Tree problem. Are there similar methods that are proven optimal?