How You See Me - Understanding the working of a Convolutional Neural Network
Rohit Gandikota
Guntur, Andhra Pradesh
- 0 Collaborators
I present a simple method to visualize and understand how a CNN looks at an image, by backtracking all the operations of CNN on an image. I have also shown that saliency parts in the image can be identified using this method along with visualizing the attention regions in the image. This method can be improved further to make it faster and more generic. This method has succeeded in understanding a magnificent tool's working in a more simpler way and from a different point of view. ...learn more
Project status: Under Development
Intel Technologies
AI DevCloud / Xeon,
Intel Opt ML/DL Framework
Overview / Usage
Convolutional Neural networks(CNN) are one of the most powerful tools in the present era of science. There has been a lot of research done to improve their performance and robustness while their internal working was left unexplored to much extent. They are often defined as black boxes that can map non-linear data effectively. This work tries to show how we have taught CNN's to look at an image. Visual results are shown to explain what CNN is looking at in an image.
The proposed algorithm exploits the basic math behind CNN to backtrack the important pixels. This is a generic approach which can be applied to any network till VGG. This doesn't require any additional training or architectural changes. In literature, few attempts have been made to explain how learning happens in CNN internally, by exploiting the convolution filter maps. This is a simple algorithm as it does not involve any cost functions, filter exploitation, gradient calculations or probability scores. Further, we demonstrate that the proposed scheme can be used in some important Computer Vision tasks.
Methodology / Approach
We simply unroll the working of a CNN. As we know that a CNN has convolution layers, max-pooling, and fully connecting layers, we try to unroll their forward operation and backtrack the important nodes from the previous layer that are responsible for the activations in the current layer. The proposed algorithm has three approaches for the three mentioned operations.
- Backtracking through fully-connected layers:
We select top k nodes responsible for the excitation of the present node - Backtracking through Conv layers:
The nodes can be perceived as [channel,x,y]. Consider activations in the previous layer’s receptive field, multiply the weights and find out the node, most responsible for its activation. This is done by adding the 3d array in x and y-axes and finding the highest channel. - Backtracking through Pooling Layers:
Consider the previous layer’s receptive field and find the maximum activation of them. This is the unrolling of Max-pooling layer.
By the time we reach the input layers, we are left with pixels that are responsible for the prediction output by CNN. Therefore, the method can be extended for Object detection, Object tracking, Saliency proposals etc.
Technologies Used
I have used Intel's devcloud for the processing. Tensorflow library was used for this end.