Welcome back to Deep Learning and today we want to continue talking about convolutional
neural networks and the thing that we will learn about today is one of the most important
building blocks of deep networks.
So far we had those fully connected layers where each input is connected to each node.
This is very powerful because it can represent any kind of linear relationship between the
inputs. Essentially between every layer we have one matrix multiplication.
This essentially means that from one layer to another layer we can have an entire change
of representation and this also means that we have a lot of connections.
Let's think about images, videos, sounds and machine learning. Then this is a bit of a
disadvantage because they typically have huge input sizes and you want to think about how
to deal with these large input sizes.
Let's say we assume we have an image with 512 times 512 pixels. That means that one hidden
layer with 8 neurons has already 512 to the power of 2 plus 1 for the bias times 8 trainable
weights. This means that it's more than 2 million trainable weights just for a single
hidden layer.
Of course this is not the way to go and size is really a problem and there's more to that.
So let's say we want to classify between a cat and a dog. If you look at those two images
then you can see that large parts of these images just contain empty areas. They are
not very relevant.
Pixels in general are very bad features. They are highly correlated, they are scale dependent,
they have intensity variations so they are a huge problem.
Pixels in general are a bad representation from a machine learning point of view. You
want to create something that is more abstract and is summarizing the information better.
So the question is can we find a better representation and we have a certain degree of locality of
course in an image. So we can try to find the same macro features at different locations
and then reuse them. Ideally we want to construct something like a hierarchy of features where
we have edges and corners that then form eyes, then we have eyes, nose and ears that form
a face and then face, body and legs will finally form an animal. So composition matters and
if you can learn a better representation then you can also classify better. So this is really
key and what we often see in the convolutional neural networks on the early layers you find
very simple descriptors and then in the intermediate layers you will find more abstract representations
here like eyes and noses and so on and then in the higher layers you find really receptors
for example here faces. So we want to have a local sensitivity but then we want to scale
them over the entire network in order to also model these layers of abstraction. And yeah
we can do that by using convolutional neural networks. So here is generally the idea of
such architectures you instead of fully connecting everything with everything they use a so-called
receptive field for every neuron that is like a filter kernel and then they compute the same
weights over the entire image essentially a convolution and produce different of these
feature maps. Then the feature maps go to a pooling layer the pooling then tries to bring
in the abstraction and to demagnify the image where we then can go into the next stage where
we produce convolution again more layers and more feature maps the feature maps are then
summarized again by a pooling layer and you go so on until you have some abstract representation
and the abstract representation is then fed to a fully connected layer and this fully
connected layer in the end maps to the final classes which is like car truck van and whatnot
and this is then the classification result. So we need convolutional layers we need activation
functions we need pooling to get the abstraction and to reduce the dimensionality and then
in the last layer the fully connected one for classification. So let's start with the
convolutional layers. So the idea here is that we want to exploit the spatial structure
by only connecting pixels in a neighborhood. If we want to express this in a fully connected
layer we could set every entry in our matrix to zero except they are connected by local
Presenters
Zugänglich über
Offener Zugang
Dauer
00:15:30 Min
Aufnahmedatum
2020-05-30
Hochgeladen am
2020-05-30 19:16:33
Sprache
en-US
Deep Learning - Activations, Convolutions, and Pooling Part 3
This video presents convolutional layers including the concepts of strided and dilated convolutions and how to compute their gradient.
Video References:
Ministry of Silly Walks
Further Reading:
A gentle Introduction to Deep Learning