Chapter 3 - Classification

Responsible for the session: Frida Heskebeck

Chapter summary

Classification tasks

Binary classification - Which of these two classes does the sample belong to?

Multiclass classification - Which of these X classes does the sample belong to?

Multilabel classification - Each sample has many binary labels.

Multioutput classification - Each sample has many multiclass labels.

 

Performance measure

Accuracy - Ratio of correct predictions

Confusion matrix - Rows: real classes, cols: predicted classes. Those in diagonal are correctly classified samples.

Precision - True positive compared to all positive (one column in the confusion matrix).

Recall - True positive compared to real class (one row in the confusion matrix).

The harmonic mean or F1-score - Both Precision and recall.

ROC curve - Curve of True positive rate vs false positive rate.

It can be useful to look on pairwise scatter plots, like this:
(see seaborn's PairGrid) Links to an external site.

scatterplot.png

Additional resources

https://www.youtube.com/watch?v=0Lt9w-BxKFQ Links to an external site.Intro to Scikit-Learn, he uses Jupyter as well.

 

 

https://www.youtube.com/watch?v=44jq6ano5n0&list=PLQVvvaa0QuDfKTOs3Keq_kaG2P55YRn5v&index=13 Links to an external site.Videos about many different topics - I will save this one for future guidance.

 

Session Agenda

The plan for the meeting:

  • Summary of the chapter.
  • Discussion from examples that we all bring to the meeting (e.g., detect skin cancer from photos, spam mail classification, and so on)
    • Which type of classification task is it?
    • What would be a useful metric for performance measurements? What would the goal performance be?
  • We will discuss tasks 1 and 2 with the following focus:
    • 1 - What is the general workflow (recap chap 2)? How to guess suitable hyperparameters to try? Time-consumption? Difficulties with the task?
    • 2 - What is possible data augmentation that can be done to data in your research/examples we discussed earlier? What is the purpose of data augmentation? Difficulties with the task?

 

What you need to bring to the meeting:
Examples with classification tasks (e.g., detect skin cancer from photos) that we can discuss during the session. It could be examples from your research, from your everyday life or other problems you have heard of.