Welcome back to Deep Learning and we want to continue our analysis of regularization
methods and today I want to talk about classical techniques.
The field is kind of stabilized to the point where some core ideas from the 1980s are still
used today.
And that was my 1987 diploma thesis which was all about that.
So here is a typical example on a loss curve over the iterations on the training set and
what I show here on the right hand side is the loss curve on the test set.
And you see that although the training loss goes down, the test loss goes up.
So at some point the training data set is overfitted and it doesn't produce a model that is representative
for the data anymore.
By the way, always keep in mind that the test set must never be used for training.
If you're trained on your test set, then you will get very good results, but it's very
likely to be a complete overestimate of the performance.
So there's the typical situation that somebody runs into my office and says, yes, I have
99% recognition rate.
The first thing that somebody in pattern recognition or machine learning does when he reads 99%
recognition rate, did you train on your test data?
This is the very first thing you make sure that has not happened.
And then you did some stupid mistake, there's some data set pointer that was not pointing
to the right data set and suddenly your recognition rate breaks in.
So be careful.
If you have very good results, always scrutinize that they're really appropriate and that they're
really general.
So you have to be very careful about this.
Because it doesn't work.
So instead, if you want to produce curves like the ones that I'm showing here, you may
want to use a validation state that you take off the training data set you never use in
training, but you can use it to get an estimate for your model overfitting.
So if you do that, then we can already use the first trick, use the validation set, we
observe at what point we have the minimum error in the validation set.
And if we're at this point, we can use that as a stopping criterion and use that model
for our test evaluation.
So that's very typical to use the parameters with the minimum validation nodes.
Another very useful technique is data augmentation.
So the idea here is to artificially enlarge the data set.
Now you asked, but how?
Well, the idea is that there are transformations on the label which should be invariant.
Let's say you have the image of a cat and you rotate it by 90 degrees, it still shows
a cat.
Obviously, those augmentation techniques have to be done carefully.
So in the right hand example, you can see that a rotation by 180 degrees is probably
not a good way of augmenting because it may switch the label.
So there's very common transformations here, random spatial transforms like affine or elastic
transforms.
There's pixel transforms like changing the resolution, changing noise, or changing pixel
distributions like color, brightness, and so on.
So these are typical augmentation techniques in image processing.
What else?
We can regularize in the loss function.
And here we can see that this essentially leads to maximum a posteriori estimation.
Presenters
Zugänglich über
Offener Zugang
Dauer
00:14:11 Min
Aufnahmedatum
2020-05-09
Hochgeladen am
2020-05-09 13:56:10
Sprache
en-US
Deep Learning - Regularization Part 2
This video discusses classical regularization techniques such as early stopping using a validation set, augmentation, and maximum a-posteriori methods that expand the loss function using a regularization term.
Video References:
Lex Fridman's Channel
Further Reading:
A gentle Introduction to Deep Learning