SVD

- Singular-value decomposition. Any matrix can be factorized as $LaTeX: \begin{equation} A = U\Sigma V^* \end{equation}$ , where U and V are orthogonal matrices and $LaTeX: \Sigma$ is a diagonal matrix with positive entries. For this session, it's going to be important to relate matrix multiplication to data transformation. Consider two independent uniformly distributed random variables $LaTeX: X_1,\ X_2$ , and define $LaTeX: X = \begin{bmatrix} X_1 & X_2 \end{bmatrix}^\top$ , sampled and plotted below.

What does $LaTeX: AX = U\Sigma V^*X$ do? Specifically what does each matrix LaTeX: U , $LaTeX: \Sigma$ and LaTeX: V^* do when applied to this dataset?

PCA

- SVD on Gaussian Variables

I believe this is quite well covered in the book, but we will discuss a couple of things. If you work with sampled signals, or pictures numpy.linalgs SVD algorithm cannot be applied naively. Why? Run the following code:

import numpy as np

X = np.random.rand(100000,2)

np.linalg.svd(X)

You will most likely get

MemoryError: Unable to allocate 74.5 GiB for an array with shape (100000, 100000) and data type float64

Is it an unreasonable request to want to perform SVD on this extremely tall matrix? - not really. Let's say you want to whiten a couple of signals, 100000 samples is not that much. Is there a workaround?

A *note* on incremental PCA. The original authors of the algorithm included a forgetting factor, in order to prioritize new data over old data (in online settings). This is not implemented in scikit learn (unfortunately). Does anyone know what algorithms to use in this case?

ICA

Independent Component Analysis - Gaussian variables are forbidden.

ICA is not in the book. Using PCA you can whiten (decorrelate) variables. If your input data is normally distributed,

linearely transformed normal distribution

it will look something like the above picture. A linear transformation of datapoints can reflect/rotate (around the origin), stretch or a combination of this. Meaning, any linear combination of normally distributed data points will look like an ellipsoid. We see that the two random variables are negatively correlated in the above example. This means, that assuming the input data is decorelated, and normalized (i.e. looking something like this).

uncorrelated unit variance 2d normal distribution

We can reconstruct the ball (circle) above by using PCA to find the axes of highest variance, which are the semiaxes of the ellipsoid, and normalize them. The problem is that a ball is invariant under rotation, meaning, we cannot separate LaTeX: X from LaTeX: AX if, is a rotation matrix. Contrast this to

which is invariant only to rotations of $LaTeX: 90^{\circ}$ .

Exercise 1.

a) Sample a 2d independent uniformely distributed random variable, and multiply it by a rotation matrix. The following code can be used to sample.

import numpy as np
X = np.random.rand(2000,2) - 0.5

Hint: A rotation around the origin by $LaTeX: \theta$ is on the following form
$LaTeX: A = \begin{bmatrix} \cos(\theta) & -\sin(\theta) \\ \sin(\theta) & \cos(\theta) \end{bmatrix}$ .

b) On your now rotated dataset, apply a stretching matrix (i.e. a diagonal matrix with strictly positive entries).

c) Once more rotate your dataset (using another angle than 1.a). What does your dataset look like?

-----------------------------------------------------------------------------

Linear ICA is about doing precisely the steps a,b and c - but backward, i.e. from a dataset find the sources with the least dependence. Assuming sources of unit variance (you can always rescale them later), we can reverse engineer the last two steps of the matrix multiplication by PCA, i.e. find the axes of highest - least variance, and normalize them. The last step is not so easy, we want to minimize dependence. There is no foolproof metric for dependence between random variables, thus ICA is a lot about finding a good proxy for independence. Once you have a metric for independence the last step becomes an optimization problem. One approach is to look at higher order moments, for example kurtosis, but there are other ways as well.

Exercise 2.

Download teach.zip and extract the two images, import them into python, resize them them to have equal dimensions and flatten them into 1d (or 2d) arrays. Then plot them in a scatter plot. Helpful code

from PIL import Image
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt

im_1 = Image.open(<path>)  # To read image from file
plt.imshow(im_1, cmap = mpl.cm.binary, interpolation="nearest") # To show an image in python
im_2 = Image.open(<path>) 
im_2_resized = im_2.resize(<tuple>) # Resizing
S1 = np.array(im_1)  # Convert to numpy array
...
plt.plot(S1.flatten(), ...)

Challenge 1.

Download separate_this.zip and extract the two images, convert to numpy arrays, flatten them and using sklearn's FastICA algorithm, separate the images, plot them and show me in class.

Clarification: The images (200, 150, 3) - (height, width,#(colors)), are already mixed, you only need separate them.

Hint: When recovering the images you will have to rescale and translate them, convert them to int such that they are something like (200, 150, 3), with pixels taking integer values in [0, 255].

----------------------------------------------------------------------------

Sklearn has implemented the "FastICA" algorithm, which is nicely described in the following survey paper https://www.cs.helsinki.fi/u/ahyvarin/papers/NN00new.pdf . However, you should know that ICA is a technology by now, meaning, you can apply it without really knowing exactly how it works.

----------------------------------------------------------------------------

T-SNE and UMAP

Have a look atFirst million integers, laid out with UMAP

We will discuss T-SNE and UMAP more in-depth in class.

UMAP explanation

Dont miss the UMAP Audio Explorer

UMAP MNIST explorer