Chapter 6 - Decision Trees

Material

I have appended a zip file of the files we will discuss at the seminar. It includes code examples (to visualize decision trees) as well as a pdf discussing topics and summarizing the chapter. The zip-file can be found here. Download The zip-file can be found here.

Summary

Decision trees are a class of machine learning algorithms that are especially powerful in predicting, especially in connection to random trees (see Chapter 7 - Ensemble Learning and Random Forests).

Decision trees are Binary unbalanced trees where each decision is characterized by a feature k and a threshold tk.

Impurity measurements

The cost function of a decision tree can be described by

J(k,tk)=mleftmIleft+mrightmIright

where Inodeis the impurity metric for the left and the right path in the tree, mnodeis the number of samples on either the left or the right side of the subtree, and is the total number of samples in the subtree.

For classification: Inodecan either be the Gini impurity metric, Gi, or the Shannon Entropy, Hi.
For regression: Inode is the Mean Squared Error, MSEi.

Gini Impurity

The Gini impurity is defined as

Gi=1k=1npi,k2

where pi,kis the ratio of class k samples among all the training samples in the i:th node.

Shannon Entropy

The Shannon entropy is defined as

Hi=k=1,pi,k0npi,klog2(pi,k).

Mean Squared Error (MSE)

The mean squared error is given by

MSEnode={MSEnode=inode(y^nodey(i))2y^node=1mnodeinodey(i)

 

Regularization

  • max_depth: Forces an upper size of the decision tree.
  • min_samples_split: Minimum number of samples a node has to have to be able to split.
  • min_samples_leaf: Minimum number of samples necessary to be a leaf node.
  • min_weight_fraction_leaf: Same as min_samples_leaf but expressed as a fraction of the total number of weighted samples
  • max_leaf_nodes: Maximum number of leaf nodes allowed.
  • max_features: Maximum number of features evaluated for splitting a node.

Session agenda

  • Summary of the chapter. Slides will be provided shortly.
  • Discussion regarding when Decision trees could be a powerful tool.
  • Discussion regarding the problem of the Decision tree being sensitive to data perturbations
  • Exercise discussions (Focus: 2, 3, 4. Extra: 7+8)

Extra

Try to reflect over the heuristic method that we use to optimize the Decision Tree problem. Are there similar methods that are proven optimal?