Visualizing CNN filters

Scale the weights between 0 - 255

Visualizing the distribution

1) Take the features from the required layer and reduce it to 2 dimensions with PCA or t-SNE(t-distriubuted statistical neighbourhood embedding) (preferred)

2) Plot the reduced feature for all data samples.

Here is a plot of t-SNE reduced feature points from ALEXNET with their corresponding image embedded. You can find similar images clustered together.

Saliency maps

Helps to find out the image pixels that mattered for classification

Find the gradient of the model output (say classification score) with respect to image pixels.

$ \text{SaliencyMap}[i,j] = \frac{f(I) - f(I')}{I[i,j]-c} $

Where

$ I' \quad \text{is the image with pixel (i,j) set to some constant c}\\ f \quad \text{is the model} $

Instead of the final score, we can also find the gradient of some intermediate activations (i.e, choose a specifc value in some CNN layer output). This will show the patch of the image this neuron is looking at.

Saliency maps can also be used for Coarse Semantic segmentation without needing any masked training data. However, the model should be run repeatedly for every pixel

Generating synthetic images

1) Define a function that maximizes a particular class score from model output (for maximizing a function use gradient ascent update)

$ max \quad S_c[I] - \lambda ||I||_2^2 $

2) Compute the gradient of this lodd w.r.t image pixels

$ I_t = I_t + \gamma \frac{\partial (S_c[I] - \lambda ||I||_2^2)}{\partial I} $

3) Update the image

Deep dream

Instead of maximizing a specific class score, maximize L2 norm of an entire CNN layer output and update the image

Feature inversion for image reconstruction

This is useful for image reconstruction. The image image can be encoded as a feature vector, which can be used to decode the image

The feature vectors in the earlier layers of CNN reconstruct images almost similar to original image

Texture synthesis (Gram reconstruction)

Instead of matching a single feature vector, we try to consider all features by consolidating them in a gram matrix

Let $F^l$ be the feature tensor from layer l of the image for which we want to generate texture

$ F_l \in \mathbb{R}^{C_l \times H_l \times W_l} $

Reshape the feature to a matrix

$ F_l \in \mathbb{R}^{C_l \times H_lW_l} $

Gram matrix at layer l by computing the outer product of the feature matrix

$ G_l = F_lF_l^T \in \mathbb{R}^{C \times C} $

Define the loss function for texture synthesis

$ E_l[I] = ||G_l - G_l'[I]||_2^2\\ min \quad L[I] = \sum_l w_l E_l[I] $

Compute gradient of loss function w.r.t image pixels and update image

Neural Style transfer

Blending two images by applying feature inversion + gram reconstruction

Do a weighted minimization of the following losses

Minimize feature inversion loss from content image

Minimize Gram reconstruction loss from style image

Loss network is a pre-trained network like VGG. We can train a separate feed forward network for each style