Chapter 14 - Deep Computer Vision Using Convolutional Neural Networks

Responsible for the session: Frida Heskebeck

Chapter summary

Basic idea

Look at small parts of a figure to find lines in different directions.
Look at the lines you found and combine those to shapes.
Look at the shapes to get the full picture.

Screenshot 2020-11-02 at 14.18.53.png

Receptive field

"The window a neuron is viewing from the layer below"

A neuron at position LaTeX: i,j $i,j$ in a layer is connected to the neurons at

rows $i$ to $i+f_h-1$ , and
columns $j$ to $j+f_w-1$

in the layer below with LaTeX: f_h $f_h$ and LaTeX: f_w $f_w$ is the height and width of the window. One can also include padding and then the receptive field will ve spaced out on the layer below.

Example from the picture below: The red neuron is positioned at (5,0) and has a receptive field of the neurons in the lower layer as (5 - 7, 0 - 2)

Screenshot 2020-11-02 at 14.28.33.png

Convolutional layers

The output of a neuron is the weighted sum of the neurons in the receptive field. The weights are called a filter. A feature map is the picture you get after using the same filter for all possible receptive fields in a layer (the picture above shows one feature map). A convolutional layer has many feature maps, meaning that it looks for many features in the layer below (it uses many different filters). To be precise: The output of a neuron is the weighted sum of the neurons in the receptive field overall feature maps in the layer below. This can be expressed with this monster (more details in the book):

Screenshot 2020-11-02 at 16.03.06.png

The size of the weights for a convolutional layer is $LaTeX: [f_h, f_w, f_{n'},f_n]$ $[f_h, f_w, f_{n'},f_n]$ corresponding to: [height of filter, width of filter, number of feature maps in layer below, number of feature maps in this layer]

The picture below illustrates that a neuron positioned at i,j has the same receptive field but a different filter for that receptive field, depending on what feature map the neuron belongs to.

Screenshot 2020-11-02 at 15.39.52.png

Padding

What to do at the edges:

Valid - Only looking at valid data, might ignore some data
Same - Pad with zeros to use all data if stride 1 then output is the same dimension as input.

Screenshot 2020-11-02 at 16.10.17.png

Pooling layers

Reduce the size of images, keep the important parts of the images, reduce sensitivity to translation invariance to some extent.

The receptive field of the layers works in the same way as before. The difference here is that instead of calculating the weighted sum, the neuron takes the maximum value of the receptive field (max pooling) or the average of the receptive field (average pooling).

The pooling is usually done on the feature maps individually.

Max pooling is more used today than the average pooling. There is also a global average pooling layer. It calculates the average over each feature map, hence outputting as many numbers as there were feature maps in the previous layer.

Screenshot 2020-11-04 at 11.32.42.png

Data augmentation

Increase the number of training instances to get more training data. Tweek the input in some way so that each input is slightly different but still the same (if the picture is of a dog the picture should still be of a dog). For pictures, one can for example shift, rotate, resize, crop, flip, or change exposure/contrast.

Screenshot 2020-11-04 at 11.56.46.png

Architectures

The overall message for the architecture of a CNN is to alternate convolutional layers with pooling layers and at the end have some fully connected layers. The number of feature maps is usually doubled after each pooling layer.

Screenshot 2020-11-04 at 11.42.01.png

Inception module

Different very small convolutional layers in parallel that are then concatenated to one output. The inception module looks in the depth dimension of the feature maps and is a bottleneck layer which reduces the dimensionality of the feature maps (outputs fewer feature maps than the input). The inception module can be used as any other layer in your network.

Screenshot 2020-11-04 at 12.04.26.png

Skip connections

Add the input of a layer a few steps ahead. This can be helpful if there is some layer that does not learn in the network.

Screenshot 2020-11-04 at 13.34.58.png

Depthwise separable convolutional layer

The first part looks at only one feature map at the time and uses one filter for that feature map. The second part is a normal convolutional layer and has a 1x1 filter and hence only looks across the feature maps, not spacially.

Screenshot 2020-11-04 at 13.42.27.png

Fully convolutional networks

Replace the dense layers at the top of a network with convolutional layers. By doing this the network can be used for input of different sizes.

Screenshot 2020-11-04 at 14.04.37.png

Applications

There are many existing pre-trained networks that one can use Links to an external site.. One can use these for transfer learning and retrain the top layers.

Classification and localization - Find an object and mark its bounding box and label it.

Semantic segmentation - Classify each pixel in a picture.

Wavenet - Generate humanlike speech.

EEG-data - Many different examples of CNN's and EEG-data.

Many Links to an external site. many more Links to an external site....

Additional resources

Paper Links to an external site. where inception module was presented (same reference as in the book).

Wavenet Links to an external site..

EEGNet Links to an external site..

Nice overview Links to an external site..

Some applications Links to an external site..

Session Agenda

The meeting plan:

Go through the chapter summary and discuss that.
Discuss the suggested exercises.
If we have time and if you are interested I can show you what I have done with CNN's and EEG-data.

Recommended exercises

The following exercises from the book but with stated focus.

1. What are the pros and cons of CNN?

3. What can we do to reduce memory usage while training?

4. What is the point of pooling layers?

9. Practice building a CNN, no need to get super performance but make it run.