Scale the weights between 0 - 255
1) Take the features from the required layer and reduce it to 2 dimensions with PCA or t-SNE(t-distriubuted statistical neighbourhood embedding) (preferred)
2) Plot the reduced feature for all data samples.
Here is a plot of t-SNE reduced feature points from ALEXNET with their corresponding image embedded. You can find similar images clustered together.
Helps to find out the image pixels that mattered for classification
Find the gradient of the model output (say classification score) with respect to image pixels.
$ \text{SaliencyMap}[i,j] = \frac{f(I) - f(I')}{I[i,j]-c} $
Where
$ I' \quad \text{is the image with pixel (i,j) set to some constant c}\\ f \quad \text{is the model} $
Instead of the final score, we can also find the gradient of some intermediate activations (i.e, choose a specifc value in some CNN layer output). This will show the patch of the image this neuron is looking at.
Saliency maps can also be used for Coarse Semantic segmentation without needing any masked training data. However, the model should be run repeatedly for every pixel
1) Define a function that maximizes a particular class score from model output (for maximizing a function use gradient ascent update)
$ max \quad S_c[I] - \lambda ||I||_2^2 $
2) Compute the gradient of this lodd w.r.t image pixels
$ I_t = I_t + \gamma \frac{\partial (S_c[I] - \lambda ||I||_2^2)}{\partial I} $
3) Update the image
Instead of maximizing a specific class score, maximize L2 norm of an entire CNN layer output and update the image
This is useful for image reconstruction. The image image can be encoded as a feature vector, which can be used to decode the image
The feature vectors in the earlier layers of CNN reconstruct images almost similar to original image
Instead of matching a single feature vector, we try to consider all features by consolidating them in a gram matrix
Let $F^l$ be the feature tensor from layer l of the image for which we want to generate texture
$ F_l \in \mathbb{R}^{C_l \times H_l \times W_l} $
Reshape the feature to a matrix
$ F_l \in \mathbb{R}^{C_l \times H_lW_l} $
Gram matrix at layer l by computing the outer product of the feature matrix
$ G_l = F_lF_l^T \in \mathbb{R}^{C \times C} $
Define the loss function for texture synthesis
$ E_l[I] = ||G_l - G_l'[I]||_2^2\\ min \quad L[I] = \sum_l w_l E_l[I] $
Compute gradient of loss function w.r.t image pixels and update image
Blending two images by applying feature inversion + gram reconstruction
Do a weighted minimization of the following losses
Minimize feature inversion loss from content image
Minimize Gram reconstruction loss from style image
Loss network is a pre-trained network like VGG. We can train a separate feed forward network for each style