FRTN65
Exercise 4
Skip To Content
Dashboard
  • Login
  • Dashboard
  • Calendar
  • Inbox
  • History
  • Help
Close
  • My dashboard
  • FRTN65
  • Assignments
  • Exercise 4
2024 HT/Autumn
  • Home
  • Modules
  • Quizzes
  • Assignments
  • Syllabus

Exercise 4

  • Due No Due Date
  • Points None

Exercise 4.1 Boosting on sonar data

Run the notebook ex04_boosting.ipynb Links to an external site. where you will tweak some code that implements boosting with a weight vector and a simple loop. Then you compare it to sklearns AdaBoost implementation and see how well you got it to work, you should be able to get to a similar performance with rather simple means.

Exercise 4.2 Singular Value Decomposition

a)  Prove that if U is a unitary matrix (UTU=UUT=I) and y=Ux then ‖y‖22=‖x‖22, i.e. the sum of squares of the matrix elements is the same for y and x ("Unitary matrices don't change lengths").

b) Given a matrix A, one way of finding the U,S and V in the SVD decomposition A=USVTis to compute the eigenvectors and eigenvalues of the two matrices AAT and ATA. Explain why, and how U,S,V can be obtained from this information. (But it is not the best method)

c) Use a SVD  X=USVT to rewrite the normal equation XTXθ=XTy, and solve for θ. Show the solution (if XTX is invertible) is given by θ^=V1(S1)−1U1Ty.

d) Same problem but with Tikhonov regularisation, for which (XTX+γIp)θ=XTy

e) On the lecture we introduced the projections P1=U1U1T and Q1=V1V1T. Prove the stated claims on slides 37-38 (i.e. that P12=P1 etc)

f) The SVD can be used to approximate a matrix A=USVT with a matrix of lower rank. Write matlab or python code that takes an image, represented as a matrix of pixel intensities, and calculates an optimal rank k approximation Ak. Try different values of k=1,2,10,... and plot the  images Ak.

Hint:  Ak = U1S1V1T where U1,S1 and V1 are the parts corresponding to the k largest singular values.

 

Exercises 4.3 Kmeans and precoding Investigate the code ex04_kmeans.ipynb Links to an external site. which uses the following idea: First it clusters the data vectors of length 64 corresponding to flattened 8x8 images of the numbers (0-9) in 10 clusters. Then it represents each image as a vector of 10 coordinates corresponding to the distances to the 10 cluster centers, so each image is now represented by a vector of length 10. It then compares performance of a logistic regression classifier with these 10 numbers to a classifier that uses the 64 numbers.

(Extra: you can plot the 10 class centers to get a visual intuition on what the k-means algorithm is doing.)

Unfortunately, it seems the performance was worsened. But the idea can actually be made to work. Try to tune the algorithm so that a positive improvement is obtained. You should be able to get about 1% average improvement, possibly even 2%.

Exercises 4.4 Kmeans Implement your own K-means algorithm (choose language yourself). Try it on a data set of your choice.

 

Exercise 4.5 LDA

Prove that the decision regions in LDA are bounded by linear functions.

Hint: The decision region between two classes is most easily found by using the fact that the LDA classifier chooses the cluster i with the highest probability, which according to Bayes' rule is proportional to the log-likelihood scaled with the prior (probability of data belonging to a certain class i):
 log⁡p(y=i|x⋆)∼log⁡(p(x⋆|y=i)p(y=i))=consti−12(x⋆−μi)TΣ−1(x⋆−μi).

 

Exercise 4.6 - PCA and MNIST

Investigate the code ex04_pcaMNIST.ipynb Links to an external site.
a) How far can you compress the images before classifier performance deteriorates?

b) In terms of representing as much variance as possible with as few components as possible, was it a good idea to apply the standard scaler on each pixel? Try without.

 

 

Solutions sol4new.pdf Download sol4new.pdf

exercise 4.3: Try n_clusters=100

0
Please include a description
Additional Comments:
Rating max score to > pts
Please include a rating title

Rubric

Find Rubric
Please include a title
Find a Rubric
Title
You've already rated students with this rubric. Any major changes could affect their assessment results.
 
 
 
 
 
 
 
     
Can't change a rubric once you've started using it.  
Title
Criteria Ratings Pts
This criterion is linked to a Learning Outcome Description of criterion
threshold: 5 pts
Edit criterion description Delete criterion row
5 to >0 pts Full Marks blank
0 to >0 pts No Marks blank_2
This area will be used by the assessor to leave comments related to this criterion.
pts
  / 5 pts
--
Additional Comments
This criterion is linked to a Learning Outcome Description of criterion
threshold: 5 pts
Edit criterion description Delete criterion row
5 to >0 pts Full Marks blank
0 to >0 pts No Marks blank_2
This area will be used by the assessor to leave comments related to this criterion.
pts
  / 5 pts
--
Additional Comments
Total Points: 5 out of 5
Previous
Next
Lecture 4. Supervised Learning 4 (and some unsupervised learning)Next Module:
Week 5