27 - Deep Learning - Plain Version 2020 [ID:21161]

50 von 98 angezeigt

Welcome back to deep learning and today I want to talk about part two of the architectures.

And now we want to go even a bit deeper in the second part, some deeper models.

So we see that going deeper really was very beneficial for the error rates.

So we can see with the results on ImageNet in 2011 with a shallow support vector machine,

you see that the error rates were really high, 25%.

AlexNet already almost cut it to half in 2012, then Zyler 2013, next winner with again eight

layers, then VGG in 2014, 19 layers already better, GoogleNet in 2014, 22 layers also

almost the same performance.

So you can see that the more we increase the depth, the better seemingly the performance

gets and we can see there's only a little bit of margin left in order to beat human

performance.

That seems to be a key role in building good networks.

Well, why could that be the case?

One reason why those deeper networks may be very efficient is something that we call exponential

feature reuse.

So here you can see if we only had two features, if we stack neurons on top, you can see that

the number of possible paths is exponentially increasing.

So with two neurons, I have two paths.

With another layer of neurons, I have two to the power of two paths and then next layer

two to the power of three paths, two to the power of four paths and so on.

So deeper networks seem somehow be able to reuse information from the previous layers.

And we can also see that if we look in what they are doing, then we also get these visualization

results that they increasingly build more abstract representations.

So we somehow see a kind of modularization happening and we think that deep learning

works because we are able to have different parts of the function at different positions.

So we are disentangling somehow the processing into simpler steps and then we essentially

train a program with multiple steps that is able to describe more and more abstract representations.

So here we see the first layers, they do maybe edges and blobs.

Let's say layer number three does textures, layer number five object parts, layer number

eight already object classes.

These images here are created from visualizations from AlexNet.

So you can see that this somehow seems to be happening really in the network and this

is also probably a key reason why deep learning works so well, that we are able to disentangle

the function and that we try to compute different things at different positions.

Well, we want to go deeper and one technology that has been implemented there is again the

inception modules and the improved inception modules now essentially replace those filters

that we've seen with the five by five convolutions and three by three convolutions into multiple

of those convolutions.

So that was the first idea that the inception module instead doing a five by five convolution,

you do two three by three convolutions in a row.

That already saves a couple of computations and you can then replace five by five filters

by stacking filters on top and we can see that this actually works for a broad variety

of filters that you can actually separate them into several steps after another and

you can cascade them.

Filter cascading is something that you would also discuss in a typical computer vision

class.

So inception V2 then already had 42 layers and they start with essentially three by three

convolutions and three modified inception modules like the one that we just looked at.

Then in the next layer an efficient grid size reduction is introduced that is using strided

convolutions so you have one by one convolutions for channel compression and then a three by

Teil einer Videoserie :

Deep Learning - Plain Version

Presenters

Prof. Dr.-Ing. Andreas Maier

Zugänglich über

Offener Zugang

Dauer

00:09:45 Min

Aufnahmedatum

2020-10-12

Hochgeladen am

2020-10-12 18:26:19

Sprache

en-US

Deep Learning - Architectures Part 2

This video discusses the success of deeper models including Inception V2 and V3. One key technology that is introduced in V3 is label smoothing regularization.

For reminders to watch the new video follow on Twitter or LinkedIn.

Further Reading:
A gentle Introduction to Deep Learning

Einbetten

Wordpress FAU Plugin

 https://www.fau.tv/clip/id/21161

iFrame

<iframe src="https://api.video.uni-erlangen.de/services/oembed/?url=https://www.fau.tv/clip/id/21161&format=iframe&maxwidth=1280&maxheight=720" width="1280" height="720"seamless allowfullscreen style="border: 0; padding: 0; margin: 0;overflow: hidden;"></iframe>

Herunterladen

Video

Per RSS abonnieren