Convolutional Neural Networks (CNN)- Step 1- Convolution Operation

6 minute read

Step 1 – Convolution Operation

In this tutorial, we are going to learn about convolution, which is the first step in the process that convolutional neural networks undergo. We’ll learn what convolution is, how it works, what elements are used in it, and what its different uses are.

Get ready!

What is convolution?

In purely mathematical terms, convolution is a function derived from two given functions by integration which expresses how the shape of one is modified by the other. That can sound baffling as it is, but to make matters worse, we can take a look at the convolution formula:

If you don’t consider yourself to be quite the math buff, there is no need to worry since this course is based on a more intuitive approach to the concept of convolutional neural networks, not a mathematical or a purely technical one.

Those of you who have practiced any field that entails signal processing are probably familiar with the convolution function.

If you want to do some extra work on your own to scratch beneath the surface with regard to the mathematical aspects of convolution, you can check out this 2017 University professor Jianxin Wu titled “Introduction to Convolutional Neural Networks.”

Let’s get into the actual convolution operation in the context of neural networks. The following example will provide you with a breakdown of everything you need to know about this process.

The Convolution Operation

Here are the three elements that enter into the convolution operation:

  • Input image

  • Feature detector

  • Feature map

As you can see, the input image is the same smiley face image that we had in the previous tutorial. Again, if you look into the pattern of the 1’s and 0’s, you will be able to make out the smiley face in there.

Sometimes a 5×5 or a 7×7 matrix is used as a feature detector, but the more conventional one, and that is the one that we will be working with, is a 3×3 matrix. The feature detector is often referred to as a “kernel” or a “filter,” which you might come across as you dig into other material on the topic.

It is better to remember both terms to spare yourself the confusion. They all refer to the same thing and are used interchangeably, including in this course.

How exactly does the Convolution Operation work?

You can think of the feature detector as a window consisting of 9 (3×3) cells. Here is what you do with it:

  • You place it over the input image beginning from the top-left corner within the borders you see demarcated above, and then you count the number of cells in which the feature detector matches the input image.

  • The number of matching cells is then inserted in the top-left cell of the feature map.

  • You then move the feature detector one cell to the right and do the same thing. This movement is called a and since we are moving the feature detector one cell at time, that would be called a stride of one pixel.

  • What you will find in this example is that the feature detector’s middle-left cell with the number 1 inside it matches the cell that it is standing over inside the input image. That’s the only matching cell, and so you write “1” in the next cell in the feature map, and so on and so forth.

  • After you have gone through the whole first row, you can then move it over to the next row and go through the same process.

It’s important not to confuse the feature map with the other two elements. The cells of the feature map can contain any digit, not only 1’s and 0’s. After going over every pixel in the input image in the example above, we would end up with these results:

By the way, just like feature detector can also be referred to as a kernel or a filter, a feature map is also known as an activation map and both terms are also interchangeable.

What is the point from the Convolution Operation?

There are several uses that we gain from deriving a feature map. These are the most important of them: Reducing the size of the input image, and you should know that the larger your strides (the movements across pixels), the smaller your feature map. In this example, we used one-pixel strides which gave us a fairly large feature map.

When dealing with proper images, you will find it necessary to widen your strides. Here we were dealing with a 7×7 input image after all, but real images tend to be substantially larger and more complex.

That way you will make them easier to read.

Do we lose information when using a feature detector?

The answer is YES. The feature map that we end up with has fewer cells and therefore less information than the original input image. However, the very purpose of the feature detector is to sift through the information in the input image and filter the parts that are integral to it and exclude the rest.

Basically, it is meant to separate the wheat from the chaff.

Why do we aim to reduce the input image to its essential features?

Think of it this way. What you do is detect certain features, say, their eyes and their nose, for instance, and you immediately know who you are looking at.

These are the most revealing features, and that is all your brain needs to see in order to make its conclusion. Even these features are seen broadly and not down to their minutiae.

If your brain actually had to process every bit of data that enters through your senses at any given moment, you would first be unable to take any actions, and soon you would have a mental breakdown. Broad categorization happens to be more practical.

Convolutional neural networks operate in exactly the same way.

How to Convolutional Neural Networks actually perform this operation?

The example we gave above is a very simplified one, though. In reality, convolutional neural networks develop multiple feature detectors and use them to develop several feature maps which are referred to as convolutional layers (see the figure below).

Through training, the network determines what features it finds important in order for it to be able to scan images and categorize them more accurately.

Based on that, it develops its feature detectors. In many cases, the features considered by the network will be unnoticeable to the human eye, which is exactly why convolutional neural networks are so amazingly useful. With enough training, they can go light years ahead of us in terms of image processing.

What are other uses of Convolution Matrices?

There’s another use for convolution matrix, which is actually part of the reason why they are called “filters”. The word here is used in the same sense we use it when talking about Instagram filters.

You can actually use a convolution matrix to adjust an image. Here are a few examples of filters being applied to images using these matrices.

There is really little technical analysis to be made of these filters and it would be of no importance to our tutorial. These are just intuitively formulated matrices. The point is to see how applying them to an image can alter its features in the same manner that they are used to detect these features.

What’s next?

That’s all you need to know for now about the convolution operation. In our next tutorial, we will go through the next part of the convolution step; the ReLU layer.