Welcome back to deep learning and today I want to talk about part two of the architectures.
And now we want to go even a bit deeper in the second part, some deeper models.
So we see that going deeper really was very beneficial for the error rates.
So we can see with the results on ImageNet in 2011 with a shallow support vector machine,
you see that the error rates were really high, 25%.
AlexNet already almost cut it to half in 2012, then Zyler 2013, next winner with again eight
layers, then VGG in 2014, 19 layers already better, GoogleNet in 2014, 22 layers also
almost the same performance.
So you can see that the more we increase the depth, the better seemingly the performance
gets and we can see there's only a little bit of margin left in order to beat human
performance.
That seems to be a key role in building good networks.
Well, why could that be the case?
One reason why those deeper networks may be very efficient is something that we call exponential
feature reuse.
So here you can see if we only had two features, if we stack neurons on top, you can see that
the number of possible paths is exponentially increasing.
So with two neurons, I have two paths.
With another layer of neurons, I have two to the power of two paths and then next layer
two to the power of three paths, two to the power of four paths and so on.
So deeper networks seem somehow be able to reuse information from the previous layers.
And we can also see that if we look in what they are doing, then we also get these visualization
results that they increasingly build more abstract representations.
So we somehow see a kind of modularization happening and we think that deep learning
works because we are able to have different parts of the function at different positions.
So we are disentangling somehow the processing into simpler steps and then we essentially
train a program with multiple steps that is able to describe more and more abstract representations.
So here we see the first layers, they do maybe edges and blobs.
Let's say layer number three does textures, layer number five object parts, layer number
eight already object classes.
These images here are created from visualizations from AlexNet.
So you can see that this somehow seems to be happening really in the network and this
is also probably a key reason why deep learning works so well, that we are able to disentangle
the function and that we try to compute different things at different positions.
Well, we want to go deeper and one technology that has been implemented there is again the
inception modules and the improved inception modules now essentially replace those filters
that we've seen with the five by five convolutions and three by three convolutions into multiple
of those convolutions.
So that was the first idea that the inception module instead doing a five by five convolution,
you do two three by three convolutions in a row.
That already saves a couple of computations and you can then replace five by five filters
by stacking filters on top and we can see that this actually works for a broad variety
of filters that you can actually separate them into several steps after another and
you can cascade them.
Filter cascading is something that you would also discuss in a typical computer vision
class.
So inception V2 then already had 42 layers and they start with essentially three by three
convolutions and three modified inception modules like the one that we just looked at.
Then in the next layer an efficient grid size reduction is introduced that is using strided
convolutions so you have one by one convolutions for channel compression and then a three by
Presenters
Zugänglich über
Offener Zugang
Dauer
00:09:45 Min
Aufnahmedatum
2020-10-12
Hochgeladen am
2020-10-12 18:26:19
Sprache
en-US
Deep Learning - Architectures Part 2
This video discusses the success of deeper models including Inception V2 and V3. One key technology that is introduced in V3 is label smoothing regularization.
For reminders to watch the new video follow on Twitter or LinkedIn.
Further Reading:
A gentle Introduction to Deep Learning