Exercise 4
- Due No Due Date
- Points None
Exercise 4.1 Boosting on sonar data
Run the notebook ex6_boosting.ipynb Links to an external site. where you will tweak some code that implements boosting with a weight vector and a simple loop. Then you compare it to sklearns AdaBoost implementation and see how well you got it to work, you should be able to get to a similar performance with rather simple means.
Exercise 4.2 Singular Value Decomposition
a) Prove that if is a unitary matrix
and
then
, i.e. the sum of squares of the matrix elements is the same for
and
("Unitary matrices don't change lengths").
b) Given a matrix , one way of finding the U,S and V in the SVD decomposition
is to compute the eigenvectors and eigenvalues of the two matrices
and
. Explain why, and how
can be obtained from this information. (But it is not the best method)
c) Use a SVD to rewrite the normal equation
, and solve for
. Show the solution (if
is invertible) is given by
.
d) Same problem but with Tikhonov regularisation, for which
e) On the lecture we introduced the projections and
. Prove the stated claims on slides 37-38 (i.e. that
etc)
f) The SVD can be used to approximate a matrix with a matrix of lower rank. Write matlab or python code that takes an image, represented as a matrix of pixel intensities, and calculates an optimal rank
approximation
. Try different values of
and plot the images
.
Hint: =
where
and
are the parts corresponding to the
largest singular values.
Exercises 4.3 Kmeans and precoding Investigate the code kmeans.ipynb Links to an external site. which uses the following idea: First it clusters the data vectors of length 64 corresponding to flattened 8x8 images of the numbers (0-9) in 10 clusters. Then it represents each image as a vector of 10 coordinates corresponding to the distances to the 10 cluster centers, so each image is now represented by a vector of length 10. It then compares performance of a logistic regression classifier with these 10 numbers to a classifier that uses the 64 numbers.
Unfortunately, it seems the performance was worsened. But the idea can actually be made to work. Try to tune the algorithm so that a positive improvement is obtained. You should be able to get about 1% average improvement, possibly even 2%.
Exercises 4.4 Kmeans Implement your own K-means algorithm (choose language yourself). Try it on a data set of your choice.
Exercise 4.5 LDA
Prove that the decision regions in LDA are bounded by linear functions.
Hint: The decision region between two classes is most easily determined by the fact that the LDA classifier chooses the cluster giving the largest log likelihood
.
Exercise 4.6 - PCA and MNIST
Investigate the code lec6pcaMNIST.ipynb Links to an external site.
a) How far can you compress the images before classifier performance deteriorates?
b) Was it a good idea to apply the standard scaler on each pixel ? Try without.
We will also talk more about lab1.
Solutions sol4new.pdf
Download sol4new.pdf
boosting_answer.ipynb
Download boosting_answer.ipynb (exercise 4.1)
svd_rank_k_approx.m Download svd_rank_k_approx.m (exercise 4.2f)
exercise 4.3: Try n_clusters=100
my_kmeans.ipynb
Download my_kmeans.ipynb (exercise 4.4)