15 - Deep Learning - Activations, Convolutions, and Pooling Part 3 [ID:14678]

50 von 215 angezeigt

Welcome back to Deep Learning.

And today we want to continue talking about convolutional neural networks.

And the thing that we will learn about today is one of the most important building blocks

of deep networks.

Instead of what humans might need, just dozens of examples, these things will need millions

of millions for very, very simple tasks.

So far we had those fully connected layers where each input is connected to each node.

This is very powerful because it can represent any kind of linear relationship between the

inputs.

Essentially, between every layer we have one matrix multiplication.

Of course, we are building on all these great abstractions that people have invented over

the millennia, such as matrix multiplications.

This essentially means that from one layer to another layer, we can have an entire change

of representation.

And this also means that we have a lot of connections.

Let's think about images, video sounds in machine learning, then this is a bit of a

disadvantage because they typically have huge input sizes.

And you want to think about how to deal with these large input sizes.

Image recognition or something like that?

Let's say we assume we have an image with 512 times 512 pixels.

That means that one hidden layer with eight neurons has already 512 to the power of 2

plus 1 for the bias times eight trainable weights.

And this means that's more than two million trainable weights just for a single hidden

layer.

Of course, this is not the way to go and size is really a problem.

And there's more to that.

So let's say we want to classify between a cat and a dog.

If you look at those two images, then you can see that a large part of these images,

they just contain empty areas.

So they are not very relevant.

So pixels in general are very bad features.

They are highly correlated.

They are scale dependent.

They have intensity variations.

So they're a huge problem.

So pixels in general are a bad representation from a machine learning point of view.

You want to create something that is more abstract and is summarizing the information

better.

So the question is, can we find a better representation?

And we have a certain degree of locality, of course, in an image.

So we can try to find the same macro features at different locations and then reuse them.

Ideally, we want to construct something like a hierarchy of features where we have edges

and corners that then form eyes.

Then we have eyes, nose and ears that form a face and then face, body and legs will finally

form an animal.

So composition matters.

And if you can learn a better representation, then you can also classify better.

So this is really key.

And what we often see in the convolutional neural networks, on the early layers, you

find very simple descriptors.

Teil einer Videoserie :

Deep Learning

Presenters

Prof. Dr.-Ing. Andreas Maier

Zugänglich über

Offener Zugang

Dauer

00:16:14 Min

Aufnahmedatum

2020-05-01

Hochgeladen am

2020-05-01 16:46:20

Sprache

en-US

Deep Learning - Activations, Convolutions, and Pooling Part 3

This video presents convolutional layers including the concepts of strided and dilated convolutions and how to compute their gradient.

Video References:
Lex Fridman's Channel
Ministry of Silly Walks

Further Reading:
A gentle Introduction to Deep Learning

Tags

Per RSS abonnieren