Welcome back to deep learning. So today we want to talk about regularization
techniques and we'll start with a short introduction to regularization and the
general problems of overfitting. So you can see here that we will first start
about the background, what is the problem of regularization, then we talk
about classical techniques, normalization, dropout, initialization, transfer
learning is a very common one, and multitask learning. So why are we talking
about this topic so much? Well, if you want to fit your data, then problems like
these ones, they're easy to fit because they have a clear solution. But
typically you have the problem that your data is noisy and you cannot easily
separate them. So what you then run into is the problem of underfitting. If you
have a model that doesn't have a very high capacity, then you may have
something like this line here, which is not a very good fit to describe the
separation of the classes. The contrary is overfitting. So here we have models
with very high capacity. Now these high capacity models try to model everything
that they observe in the training data and this may yield decision boundaries
that are not very reasonable. What we are actually interested in is a sensible
boundary that is somehow a compromise between the observed data and the actual
ground truth representation. So we can analyze this problem by the so-called
bias-variance decomposition and here we stick to regression where we have an
ideal function h and this has some value and it's typically associated with some
measurement noise. So there's some additional value epsilon that is added
to h of x and this then may be distributed normally with a zero mean
and a standard deviation of sigma epsilon. Now you can go ahead and use a
model to estimate h and this is now f hat that is estimated from some data set d
and we can now express the loss for a single point as the expected value of
the loss and here this would then simply be the L2 loss. So we take the true
function minus the estimated function to the power of 2 and compute the expected
value to yield this loss. Interestingly this loss can be shown to be
decomposable into two parts. So there is the bias and the bias is essentially the
deviation of the expected value of our model from the true model. So this
essentially measures how far we are off. The other part can be explained by the
limited size of the data set. So we can always try to find a model that is very
flexible and tries to reduce this bias and what we buy in what we get as a
result is an increase in variance. So the variance is the expected value of y hat
minus the current value of y hat to the power of 2 and there of the expected
value. So this is nothing else than the variance that we encounter in y hat and
then of course there is a small irreproducible error. Now we can
integrate this over every data point in x and we get the entire loss for the
entire data set over our loss for the single point. By the way a similar
decomposition exists for classification using the 1-0 loss which you can see in
reference 9. It's slightly different but it has similar implications. So we learn
that with an increase in variance we can essentially reduce the bias the
prediction error of our model on the training data set. Let's visualize this a
bit. So on the top left we see a low bias low variance model. This is
essentially always right and doesn't have a lot of noise in the predictions.
The top right we see a high bias model that is very consistent so no variance
but it's consistently off. In the bottom left we see a low bias high variance
model so this has a considerable degree of variation but on average it's very
close to where it's supposed to be and on the bottom right we have the case
that we want to omit. This is a high bias high variance model which has a lot of
Presenters
Zugänglich über
Offener Zugang
Dauer
00:10:35 Min
Aufnahmedatum
2020-05-30
Hochgeladen am
2020-05-31 00:46:40
Sprache
en-US
Deep Learning - Regularization Part 1
This video discusses the problem of over- and underfitting. In order to get a better understanding, we explore the bias-variance trade-off and look into the effects of training data size and number of parameters.
Further Reading:
A gentle Introduction to Deep Learning