Welcome back to deep learning. We want to continue our analysis of regularization methods
and today I want to talk about classical techniques. So here is a typical example
of a loss curve over the iterations on the training set. What I want to show you
here on the right hand side is the loss curve on the test set. You see that
although the training loss goes down the test loss goes up. So at some point the
training data set is overfitted and it doesn't produce a model that is
representative of the data anymore. By the way always keep in mind that the test
set must never be used for training. If you trained on your test set then you
will always get very good results but it's likely to be a complete overestimate
of the performance. So there is this typical situation that somebody runs
into my office and says yes I have 99% recognition rate. So the first thing that
somebody in pattern recognition or machine learning does when he reads 99%
recognition rate is ask did you train on your test data and this is the very first
thing that you should make sure that has not happened. When you did some stupid
mistake and there's some data set pointer that was not pointing into the
right data set then suddenly your recognition rate jumps up. So be careful if
you have very good results. Always scrutinize that they are really
appropriate and they are really general. So if you want to produce curves like
the ones that I'm showing to you here you may want to use a validation set
that you take off the training data set. You never use the test set in training
and if you would do that you would just get a complete overestimate of
performance and a complete overfit. So if you use a validation set then we can
already use the first trick. If you use it we observe at what point we have the
minimum error in the validation set. If we are at this point we can use it as a
stopping criterion and use that model for our test evaluation. So it's a common
technique to use the parameters with the minimum validation results. Another very
useful technique is data augmentation. So the idea here is to artificially
enlarge the data set. There are transformations on the label which
should be invariant to the class. So let's say you have the image of a cat
and you rotate it by 90 degrees it still shows a cat. Obviously those
augmentation techniques have to be done carefully. So in the right hand example
you can see that a rotation by 180 degrees is probably not a good way of
augmenting numbers because it may switch the label. So there are very common
transformations here random spatial transforms or elastic transforms then
there are pixel transformations like changing the resolution changing the
noise or changing pixel distributions like color brightness and so on. So there
are typical augmentation techniques in image processing. Well what else? We can
regularize the loss functions. Here we can see that this is essentially the so
called maximum a posteriori estimation. We can do this in the Bayesian approach
where we want to consider the uncertain weights W. They follow a prior
distribution P of W. If you have some data set X with some associated labels
Y we can see that the joint probability P of W XY is the probability of P W
given Y and X times the probability of Y and X. We can reformulate that into the
probability P of Y given X and W times the probability of P of X and W. From
these equalities we can derive the Bayes theorem that the conditional probability
P of W given Y X can be expressed as the probability P of Y given X W times the
probability P of X and W divided by the probability of P of Y and X. So we can
rearrange this a bit further and here you can see then that the probability P
of X and the probability P of Y given X pop up. By removing the terms that are
independent of W this yields a map estimate. So we can actually seek to
Presenters
Zugänglich über
Offener Zugang
Dauer
00:13:46 Min
Aufnahmedatum
2020-10-12
Hochgeladen am
2020-10-12 14:46:29
Sprache
en-US
Deep Learning - Regularization Part 2
This video discusses classical regularization techniques such as early stopping using a validation set, augmentation, and maximum a-posteriori methods that expand the loss function using a regularization term.
For reminders to watch the new video follow on Twitter or LinkedIn.
Further Reading:
A gentle Introduction to Deep Learning