FRTN65
Exercise 2
Skip To Content
Dashboard
  • Login
  • Dashboard
  • Calendar
  • Inbox
  • History
  • Help
Close
  • My dashboard
  • FRTN65
  • Assignments
  • Exercise 2
2022 HT/Autumn
  • Home
  • Modules
  • Quizzes
  • Assignments
  • Syllabus

Exercise 2

  • Due No Due Date
  • Points None

Exercise 2.1

On the lecture we considered a least squares model

LaTeX: y_i=\mathbf{x}_i^T\theta\:+\textrm{ noise },\:\quad i=1,\ldots,Nyi=xTiθ+ noise ,i=1,…,N

where each observation LaTeX: y_iyiis a real number and where LaTeX: x_ixi and LaTeX: \thetaθ are column vectors of length LaTeX: pp.

If each  observation LaTeX: \mathbf{y_i}yi instead has LaTeX: mm real components, stored in a row-vector LaTeX: \mathbf{y}_iyiof length LaTeX: mm, we can use a model of the form 

LaTeX: \mathbf{y}_i=\mathbf{x}_i^T\mathbf{\theta} + \textrm{ noise },\:\quad i=1,\ldots,Nyi=xTiθ+ noise ,i=1,…,N

where  LaTeX: \thetaθ is a matrix of size LaTeX: pp times LaTeX: mm.

Describe why the formula 

LaTeX: \widehat{\theta} = (X^TX)^{-1}X^TYˆθ=(XTX)−1XTY

still gives the least squares optimal parameters if you  stack observations into a matrix LaTeX: Y = \begin{bmatrix}
y_1 \\ y_2 \\ \vdots\\ y_N
\end{bmatrix}Y=[y1y2⋮yN] of size LaTeX: N \times mN×m,

 

Exercise 2.2 - Classification by Logistic Regression - small digit images

The code LogisticRegression_smalldigits.ipynb Links to an external site. does classification of images of numbers (0-9)

a) Experiment with different parameters to the logistic_regression() function. You should get an accuracy of at least 95%

b) Study the confusion-matrix and choose a certain mis-classified image (look for a square with a 1 in). Find the corresponding image in the dataset and save it to a file on your computer. (This is a practical exercise in handling data.)

 

Exercise 2.3 - Recursive Least Squares

Consider LS estimation of LaTeX: \theta
θ based on data from the the model

LaTeX: y_i = x_i^T \theta + e_i, \quad i=1,\ldots, Nyi=xTiθ+ei,i=1,…,N

Following the lecture, we know the estimate based on  the first LaTeX: NN data points is given byLaTeX: \widehat{\theta}_N = (X_N^TX_N)^{-1}X_N^T \mathbf{y_N}ˆθN=(XTNXN)−1XTNyN . The matrix that gets inverted is of size pxp, where p = nr of features in x.

a) When a new data point LaTeX: y_{N+1} = x_{N+1}\theta + e_{N+1}yN+1=xN+1θ+eN+1arrives the updated estimate is given by LaTeX: \widehat{\theta}_{N+1} = (X_{N+1}^TX_{N+1})^{-1}X_{N+1}^T \mathbf{y_{N+1}}ˆθN+1=(XTN+1XN+1)−1XTN+1yN+1. Show that the following generates the correct result, using the notation LaTeX: P_N = (X_N^TX_N)^{-1}PN=(XTNXN)−1 and LaTeX: s_N = X_N^T\mathbf{y}_NsN=XTNyN.

LaTeX: \begin{align*}
P_{N+1}^{-1} &= P_{N}^{-1} + x_{N+1}x_{N+1}^T \\
s_{N+1} &= s_N + x_{N+1}y_{N+1}\\
\widehat{\theta}_{N+1} & = P_{N+1} s_{N+1} \\
 & =\widehat{\theta}_{N} +P_{N+1}x_{N+1}\left( y_{N+1}-x_{N+1}^T\widehat{\theta}_{N} \right)
\end{align*}P−1N+1=P−1N+xN+1xTN+1sN+1=sN+xN+1yN+1ˆθN+1=PN+1sN+1=ˆθN+PN+1xN+1(yN+1−xTN+1ˆθN)

b) The inversion of a pxp matrix in each iteration can actually be avoided by a clever trick. Show that the first equation in a) can be rewritten as

LaTeX: P_{N+1} = P_N - \frac{P_Nx_{N+1} x_{N+1}^TP_N}{1+x_{N+1}^TP_Nx_{N+1}}PN+1=PN−PNxN+1xTN+1PN1+xTN+1PNxN+1

which avoids the matrix inverse. (The trick is a special case of the so called matrix inversion lemma).

 

Exercise 2.4 Handling Categorical Variables

Study the code Categorical_variables.ipynb Links to an external site. and how the categorical variable 'zone' is transformed to numerical form using either OneHotEncoding or numerical LabelEncoding. What could be advantages and disadvantages of the two methods ?

 

Exercise 2.5 Weighted Least Squares

If different data points have different reliability it is natural to introduce weights LaTeX: w_iwi in the loss function, so that LaTeX: \thetaθ should minimize

LaTeX: J_{WLS}(\theta) = \sum_{i=1}^N (y_i-\theta^Tx_i)^2 w_i^2JWLS(θ)=∑Ni=1(yi−θTxi)2w2i

a) Would you choose LaTeX: w_iwi small or large if the data  LaTeX: (y_i,x_i)(yi,xi) is unreliable ?

b) Prove that LaTeX: J_{WLS}(\theta)JWLS(θ) is minimized by

LaTeX: \theta=(X^TWX)^{-1}X^TWYθ=(XTWX)−1XTWY

where LaTeX: WW is the diagonal matrix LaTeX: W = \mathrm{diag}(w_1^2,\ldots,w_N^2)W=diag(w21,…,w2N).  [Hint: Consider an equivalent problem with data LaTeX: (w_iy_i,w_ix_i)(wiyi,wixi)]

 

Exercise 2.6 Gradient of logistic regression function

Find a formula for the derivatives LaTeX: \frac{\partial J}{\partial \theta_j}, j=1,\ldots, p∂J∂θj,j=1,…,p of the logistic regression loss function LaTeX: J(\theta) =   \sum_{i=1}^{n_{\textrm{data}}} \mathrm{ln} (1+e^{{-y_i}\theta^T \mathbf{x_{i}}})J(θ)=∑ndatai=1ln(1+e−yiθTxi). Here LaTeX: x_{i}xi denotes a column vector containing the LaTeX: pp inputs for data LaTeX: ii. (This calculation is useful if you want to write your own optimization code using gradient descent.

 

Exercise 2.7 Investigating the Titanic Survival Dataset

Investigate this data using the code titanic_analysis.ipynb Links to an external site.. You need to download the files titanic_train.csv Download titanic_train.csv

and titanic_test.csv Download titanic_test.csv and upload these to your google colab session. The data has a lot of missing entries, and other drawbacks which is handled during some preprocessing steps.

Three different methods, Logistic Regression, KNNs and Random Forests (which we study in Lec3), are then used to predict survival of the different passengers, depending on age, sex, passenger class, etc.

a) Describe what factors increased survival probability.

b) The prediction performance of  three methods are evaluated on the training data. We know this is not a reliable method. Change the code to use 5-fold cross-validation instead. Comment on the results.

Note: To help you understand the Pandas toolbox further, you might want to watch this

10 minute guide to pandas (its more like 30min) Links to an external site.

or scan some of the examples on pythonexamples.org/pandas-examples Links to an external site.

 

Solutions: sol2.pdf Download sol2.pdf

0
Please include a description
Additional Comments:
Rating max score to > pts
Please include a rating title

Rubric

Find Rubric
Please include a title
Find a Rubric
Title
You've already rated students with this rubric. Any major changes could affect their assessment results.
 
 
 
 
 
 
 
     
Can't change a rubric once you've started using it.  
Title
Criteria Ratings Pts
This criterion is linked to a Learning Outcome Description of criterion
threshold: 5 pts
Edit criterion description Delete criterion row
5 to >0 pts Full Marks blank
0 to >0 pts No Marks blank_2
This area will be used by the assessor to leave comments related to this criterion.
pts
  / 5 pts
--
Additional Comments
This criterion is linked to a Learning Outcome Description of criterion
threshold: 5 pts
Edit criterion description Delete criterion row
5 to >0 pts Full Marks blank
0 to >0 pts No Marks blank_2
This area will be used by the assessor to leave comments related to this criterion.
pts
  / 5 pts
--
Additional Comments
Total Points: 5 out of 5
Previous
Next
Lecture 2. Supervised Learning 2Next Module:
Week 3