FRTN65
Exercise 4
Skip To Content
Dashboard
  • Login
  • Dashboard
  • Calendar
  • Inbox
  • History
  • Help
Close
  • My dashboard
  • FRTN65
  • Assignments
  • Exercise 4
2024 HT/Autumn
  • Home
  • Modules
  • Quizzes
  • Assignments
  • Syllabus

Exercise 4

  • Due No Due Date
  • Points None

Exercise 4.1 Boosting on sonar data

Run the notebook ex04_boosting.ipynb Links to an external site. where you will tweak some code that implements boosting with a weight vector and a simple loop. Then you compare it to sklearns AdaBoost implementation and see how well you got it to work, you should be able to get to a similar performance with rather simple means.

Exercise 4.2 Singular Value Decomposition

a)  Prove that if LaTeX: UU is a unitary matrix LaTeX: (U^TU=UU^T = I)(UTU=UUT=I) and LaTeX: y=Uxy=Ux then LaTeX: \Vert y \Vert^2_2 = \Vert x \Vert^2_2‖y‖22=‖x‖22, i.e. the sum of squares of the matrix elements is the same for LaTeX: yy and LaTeX: xx ("Unitary matrices don't change lengths").

b) Given a matrix LaTeX: AA, one way of finding the U,S and V in the SVD decomposition LaTeX: A=USV^TA=USVTis to compute the eigenvectors and eigenvalues of the two matrices LaTeX: AA^TAAT and LaTeX: A^TAATA. Explain why, and how LaTeX: U,S,VU,S,V can be obtained from this information. (But it is not the best method)

c) Use a SVD  LaTeX: X=USV^TX=USVT to rewrite the normal equation LaTeX: X^TX\theta = X^TyXTXθ=XTy, and solve for LaTeX: \thetaθ. Show the solution (if LaTeX: X^TXXTX is invertible) is given by LaTeX: \widehat{\theta}=V_1\left(S_1\right)^{-1}U_1^Tyˆθ=V1(S1)−1UT1y.

d) Same problem but with Tikhonov regularisation, for which LaTeX: (X^TX+\gamma I_p)\theta = X^Ty(XTX+γIp)θ=XTy

e) On the lecture we introduced the projections LaTeX: P_1=U_1U_1^TP1=U1UT1 and LaTeX: Q_1=V_1V_1^TQ1=V1VT1. Prove the stated claims on slides 37-38 (i.e. that LaTeX: P_1^2=P_1P21=P1 etc)

f) The SVD can be used to approximate a matrix LaTeX: A=USV^TA=USVT with a matrix of lower rank. Write matlab or python code that takes an image, represented as a matrix of pixel intensities, and calculates an optimal rank LaTeX: kk approximation LaTeX: A_kAk. Try different values of LaTeX: k=1,2,10,...k=1,2,10,... and plot the  images LaTeX: A_kAk.

Hint:  LaTeX: A_kAk = LaTeX: U_1S_1V_1^TU1S1VT1 where LaTeX: U_1,S_1U1,S1 and LaTeX: V_1V1 are the parts corresponding to the LaTeX: kk largest singular values.

 

Exercises 4.3 Kmeans and precoding Investigate the code ex04_kmeans.ipynb Links to an external site. which uses the following idea: First it clusters the data vectors of length 64 corresponding to flattened 8x8 images of the numbers (0-9) in 10 clusters. Then it represents each image as a vector of 10 coordinates corresponding to the distances to the 10 cluster centers, so each image is now represented by a vector of length 10. It then compares performance of a logistic regression classifier with these 10 numbers to a classifier that uses the 64 numbers.

(Extra: you can plot the 10 class centers to get a visual intuition on what the k-means algorithm is doing.)

Unfortunately, it seems the performance was worsened. But the idea can actually be made to work. Try to tune the algorithm so that a positive improvement is obtained. You should be able to get about 1% average improvement, possibly even 2%.

Exercises 4.4 Kmeans Implement your own K-means algorithm (choose language yourself). Try it on a data set of your choice.

 

Exercise 4.5 LDA

Prove that the decision regions in LDA are bounded by linear functions.

Hint: The decision region between two classes is most easily found by using the fact that the LDA classifier chooses the cluster LaTeX: ii with the highest probability, which according to Bayes' rule is proportional to the log-likelihood scaled with the prior (probability of data belonging to a certain class LaTeX: ii):
 LaTeX: \log p(y=i | x_\star) \sim \log(p(x_\star | y=i) p(y=i)) =\textrm{const}_{i} -\frac{1}{2} (x_\star-\mu_i)^T \Sigma^{-1} (x_\star-\mu_i)logp(y=i|x⋆)∼log(p(x⋆|y=i)p(y=i))=consti−12(x⋆−μi)TΣ−1(x⋆−μi).

 

Exercise 4.6 - PCA and MNIST

Investigate the code ex04_pcaMNIST.ipynb Links to an external site.
a) How far can you compress the images before classifier performance deteriorates?

b) In terms of representing as much variance as possible with as few components as possible, was it a good idea to apply the standard scaler on each pixel? Try without.

 

 

Solutions sol4new.pdf Download sol4new.pdf

exercise 4.3: Try n_clusters=100

0
Please include a description
Additional Comments:
Rating max score to > pts
Please include a rating title

Rubric

Find Rubric
Please include a title
Find a Rubric
Title
You've already rated students with this rubric. Any major changes could affect their assessment results.
 
 
 
 
 
 
 
     
Can't change a rubric once you've started using it.  
Title
Criteria Ratings Pts
This criterion is linked to a Learning Outcome Description of criterion
threshold: 5 pts
Edit criterion description Delete criterion row
5 to >0 pts Full Marks blank
0 to >0 pts No Marks blank_2
This area will be used by the assessor to leave comments related to this criterion.
pts
  / 5 pts
--
Additional Comments
This criterion is linked to a Learning Outcome Description of criterion
threshold: 5 pts
Edit criterion description Delete criterion row
5 to >0 pts Full Marks blank
0 to >0 pts No Marks blank_2
This area will be used by the assessor to leave comments related to this criterion.
pts
  / 5 pts
--
Additional Comments
Total Points: 5 out of 5
Previous
Next
Lecture 4. Supervised Learning 4 (and some unsupervised learning)Next Module:
Week 5