Exercise 4
- Due No Due Date
- Points None
Exercise 4.1 Boosting on sonar data
Run the notebook ex04_boosting.ipynb Links to an external site. where you will tweak some code that implements boosting with a weight vector and a simple loop. Then you compare it to sklearns AdaBoost implementation and see how well you got it to work, you should be able to get to a similar performance with rather simple means.
Exercise 4.2 Singular Value Decomposition
a) Prove that if U is a unitary matrix
(UTU=UUT=I) and
y=Ux then
‖y‖22=‖x‖22, i.e. the sum of squares of the matrix elements is the same for
y and
x ("Unitary matrices don't change lengths").
b) Given a matrix A, one way of finding the U,S and V in the SVD decomposition
A=USVTis to compute the eigenvectors and eigenvalues of the two matrices
AAT and
ATA. Explain why, and how
U,S,V can be obtained from this information. (But it is not the best method)
c) Use a SVD X=USVT to rewrite the normal equation
XTXθ=XTy, and solve for
θ. Show the solution (if
XTX is invertible) is given by
ˆθ=V1(S1)−1UT1y.
d) Same problem but with Tikhonov regularisation, for which (XTX+γIp)θ=XTy
e) On the lecture we introduced the projections P1=U1UT1 and
Q1=V1VT1. Prove the stated claims on slides 37-38 (i.e. that
P21=P1 etc)
f) The SVD can be used to approximate a matrix A=USVT with a matrix of lower rank. Write matlab or python code that takes an image, represented as a matrix of pixel intensities, and calculates an optimal rank
k approximation
Ak. Try different values of
k=1,2,10,... and plot the images
Ak.
Hint: Ak =
U1S1VT1 where
U1,S1 and
V1 are the parts corresponding to the
k largest singular values.
Exercises 4.3 Kmeans and precoding Investigate the code ex04_kmeans.ipynb Links to an external site. which uses the following idea: First it clusters the data vectors of length 64 corresponding to flattened 8x8 images of the numbers (0-9) in 10 clusters. Then it represents each image as a vector of 10 coordinates corresponding to the distances to the 10 cluster centers, so each image is now represented by a vector of length 10. It then compares performance of a logistic regression classifier with these 10 numbers to a classifier that uses the 64 numbers.
(Extra: you can plot the 10 class centers to get a visual intuition on what the k-means algorithm is doing.)
Unfortunately, it seems the performance was worsened. But the idea can actually be made to work. Try to tune the algorithm so that a positive improvement is obtained. You should be able to get about 1% average improvement, possibly even 2%.
Exercises 4.4 Kmeans Implement your own K-means algorithm (choose language yourself). Try it on a data set of your choice.
Exercise 4.5 LDA
Prove that the decision regions in LDA are bounded by linear functions.
Hint: The decision region between two classes is most easily found by using the fact that the LDA classifier chooses the cluster i with the highest probability, which according to Bayes' rule is proportional to the log-likelihood scaled with the prior (probability of data belonging to a certain class
i):
logp(y=i|x⋆)∼log(p(x⋆|y=i)p(y=i))=consti−12(x⋆−μi)TΣ−1(x⋆−μi).
Exercise 4.6 - PCA and MNIST
Investigate the code ex04_pcaMNIST.ipynb
Links to an external site.
a) How far can you compress the images before classifier performance deteriorates?
b) In terms of representing as much variance as possible with as few components as possible, was it a good idea to apply the standard scaler on each pixel? Try without.
Solutions sol4new.pdf Download sol4new.pdf
exercise 4.3: Try n_clusters=100