Welcome back to Deep Learning.
And today we want to continue talking about convolutional neural networks.
And the thing that we will learn about today is one of the most important building blocks
of deep networks.
Instead of what humans might need, just dozens of examples, these things will need millions
of millions for very, very simple tasks.
So far we had those fully connected layers where each input is connected to each node.
This is very powerful because it can represent any kind of linear relationship between the
inputs.
Essentially, between every layer we have one matrix multiplication.
Of course, we are building on all these great abstractions that people have invented over
the millennia, such as matrix multiplications.
This essentially means that from one layer to another layer, we can have an entire change
of representation.
And this also means that we have a lot of connections.
Let's think about images, video sounds in machine learning, then this is a bit of a
disadvantage because they typically have huge input sizes.
And you want to think about how to deal with these large input sizes.
Image recognition or something like that?
Let's say we assume we have an image with 512 times 512 pixels.
That means that one hidden layer with eight neurons has already 512 to the power of 2
plus 1 for the bias times eight trainable weights.
And this means that's more than two million trainable weights just for a single hidden
layer.
Of course, this is not the way to go and size is really a problem.
And there's more to that.
So let's say we want to classify between a cat and a dog.
If you look at those two images, then you can see that a large part of these images,
they just contain empty areas.
So they are not very relevant.
So pixels in general are very bad features.
They are highly correlated.
They are scale dependent.
They have intensity variations.
So they're a huge problem.
So pixels in general are a bad representation from a machine learning point of view.
You want to create something that is more abstract and is summarizing the information
better.
So the question is, can we find a better representation?
And we have a certain degree of locality, of course, in an image.
So we can try to find the same macro features at different locations and then reuse them.
Ideally, we want to construct something like a hierarchy of features where we have edges
and corners that then form eyes.
Then we have eyes, nose and ears that form a face and then face, body and legs will finally
form an animal.
So composition matters.
And if you can learn a better representation, then you can also classify better.
So this is really key.
And what we often see in the convolutional neural networks, on the early layers, you
find very simple descriptors.
Presenters
Zugänglich über
Offener Zugang
Dauer
00:16:14 Min
Aufnahmedatum
2020-05-01
Hochgeladen am
2020-05-01 16:46:20
Sprache
en-US
Deep Learning - Activations, Convolutions, and Pooling Part 3
This video presents convolutional layers including the concepts of strided and dilated convolutions and how to compute their gradient.
Video References:
Lex Fridman's Channel
Ministry of Silly Walks
Further Reading:
A gentle Introduction to Deep Learning