# Exercise 2

- Due No Due Date
- Points None

**Exercise 2.1**

On the lecture we considered a least squares model

where each observation is a real number and where and are column vectors of length .

If each observation instead has real components, stored in a row-vector of length , we can use a model of the form

where is a matrix of size times .

Describe why the formula

still gives the least squares optimal parameters if you stack observations into a matrix of size ,

**Exercise 2.2 - Classification by Logistic Regression - small digit images**

The code LogisticRegression_smalldigits.ipynb does classification of images of numbers (0-9)

a) Experiment with different parameters to the logistic_regression() function. You should get an accuracy of at least 95%

b) Study the confusion-matrix and choose a certain mis-classified image (look for a square with a 1 in). Find the corresponding image in the dataset and save it to a file on your computer. (This is a practical exercise in handling data.)

**Exercise 2.3 - Recursive Least Squares**

Consider LS estimation of based on data from the the model

Following the lecture, we know the estimate based on the first data points is given by . The matrix that gets inverted is of size pxp, where p = nr of features in x.

a) When a new data point arrives the updated estimate is given by . Show that the following generates the correct result, using the notation and .

b) The inversion of a pxp matrix in each iteration can actually be avoided by a clever trick. Show that the first equation in a) can be rewritten as

which avoids the matrix inverse. (The trick is a special case of the so called matrix inversion lemma).

**Exercise 2.4 Handling Categorical Variables**

Study the code Categorical_variables.ipynb and how the categorical variable 'zone' is transformed to numerical form using either OneHotEncoding or numerical LabelEncoding. What could be advantages and disadvantages of the two methods ?

**Exercise 2.5 Weighted Least Squares**

If different data points have different reliability it is natural to introduce weights in the loss function, so that should minimize

a) Would you choose small or large if the data is unreliable ?

b) Prove that is minimized by

where is the diagonal matrix . [Hint: Consider an equivalent problem with data ]

**Exercise 2.6 Gradient of logistic regression function**

Find a formula for the derivatives of the logistic regression loss function . Here denotes a column vector containing the inputs for data . (This calculation is useful if you want to write your own optimization code using gradient descent.

**Exercise 2.7 Investigating the Titanic Survival Dataset**

Investigate this data using the code titanic_analysis.ipynb. You need to download the files titanic_train.csv and titanic_test.csv and upload these to your google colab session. The data has a lot of missing entries, and other drawbacks which is handled during some preprocessing steps.

Three different methods, Logistic Regression, KNNs and Random Forests (which we study in Lec3), are then used to predict survival of the different passengers, depending on age, sex, passenger class, etc.

a) Describe what factors increased survival probability.

b) The prediction performance of three methods are evaluated on the training data. We know this is not a reliable method. Change the code to use 5-fold cross-validation instead. Comment on the results.

Note: To help you understand the Pandas toolbox further, you might want to watch this

10 minute guide to pandas (its more like 30min)

or scan some of the examples on pythonexamples.org/pandas-examples

**Solutions: sol2.pdf**