15 - Deep Learning - Activations, Convolutions, and Pooling Part 3 [ID:16883]

50 von 144 angezeigt

Welcome back to Deep Learning and today we want to continue talking about convolutional

neural networks and the thing that we will learn about today is one of the most important

building blocks of deep networks.

So far we had those fully connected layers where each input is connected to each node.

This is very powerful because it can represent any kind of linear relationship between the

inputs. Essentially between every layer we have one matrix multiplication.

This essentially means that from one layer to another layer we can have an entire change

of representation and this also means that we have a lot of connections.

Let's think about images, videos, sounds and machine learning. Then this is a bit of a

disadvantage because they typically have huge input sizes and you want to think about how

to deal with these large input sizes.

Let's say we assume we have an image with 512 times 512 pixels. That means that one hidden

layer with 8 neurons has already 512 to the power of 2 plus 1 for the bias times 8 trainable

weights. This means that it's more than 2 million trainable weights just for a single

hidden layer.

Of course this is not the way to go and size is really a problem and there's more to that.

So let's say we want to classify between a cat and a dog. If you look at those two images

then you can see that large parts of these images just contain empty areas. They are

not very relevant.

Pixels in general are very bad features. They are highly correlated, they are scale dependent,

they have intensity variations so they are a huge problem.

Pixels in general are a bad representation from a machine learning point of view. You

want to create something that is more abstract and is summarizing the information better.

So the question is can we find a better representation and we have a certain degree of locality of

course in an image. So we can try to find the same macro features at different locations

and then reuse them. Ideally we want to construct something like a hierarchy of features where

we have edges and corners that then form eyes, then we have eyes, nose and ears that form

a face and then face, body and legs will finally form an animal. So composition matters and

if you can learn a better representation then you can also classify better. So this is really

key and what we often see in the convolutional neural networks on the early layers you find

very simple descriptors and then in the intermediate layers you will find more abstract representations

here like eyes and noses and so on and then in the higher layers you find really receptors

for example here faces. So we want to have a local sensitivity but then we want to scale

them over the entire network in order to also model these layers of abstraction. And yeah

we can do that by using convolutional neural networks. So here is generally the idea of

such architectures you instead of fully connecting everything with everything they use a so-called

receptive field for every neuron that is like a filter kernel and then they compute the same

weights over the entire image essentially a convolution and produce different of these

feature maps. Then the feature maps go to a pooling layer the pooling then tries to bring

in the abstraction and to demagnify the image where we then can go into the next stage where

we produce convolution again more layers and more feature maps the feature maps are then

summarized again by a pooling layer and you go so on until you have some abstract representation

and the abstract representation is then fed to a fully connected layer and this fully

connected layer in the end maps to the final classes which is like car truck van and whatnot

and this is then the classification result. So we need convolutional layers we need activation

functions we need pooling to get the abstraction and to reduce the dimensionality and then

in the last layer the fully connected one for classification. So let's start with the

convolutional layers. So the idea here is that we want to exploit the spatial structure

by only connecting pixels in a neighborhood. If we want to express this in a fully connected

layer we could set every entry in our matrix to zero except they are connected by local

Teil einer Videoserie :

Deep Learning - Plain Version

Presenters

Prof. Dr.-Ing. Andreas Maier

Zugänglich über

Offener Zugang

Dauer

00:15:30 Min

Aufnahmedatum

2020-05-30

Hochgeladen am

2020-05-30 19:16:33

Sprache

en-US

Deep Learning - Activations, Convolutions, and Pooling Part 3

This video presents convolutional layers including the concepts of strided and dilated convolutions and how to compute their gradient.

Video References:
Ministry of Silly Walks

Further Reading:
A gentle Introduction to Deep Learning

Tags

Per RSS abonnieren