onder
Welcome back to Deep Learning.
So, today we want to talk
about regularization
techniques and we'll start
with a short introduction to
regularization and
The general problems of
Pre deep learning textbook
that I told you, you need
you have no guarantee of convergence.
You know, all those things that you read in textbook
and they tell you stay away from this.
And they're all wrong.
So you can see here that we've a first start about the...
background. What is the problem of regularization to talk about
classical techniques, normalization
drop out, initialization, Transfer learning is a very common one.
Transfer learning, and it has been done in principle for many decades.
multitask learning. So, why are we talking about this topic so much? Well, if you
want to fit your data then problems like these ones there are easy to fit
because they have a clear solution, but typically you have the problem that your
data is noisy and you cannot easily separate them. So what you then run into
is the problem of underfitting. If you have a model that doesn't have a very
high capacity, then may have something that is lying here.
Which is not a very good fit to describe the separation of the classes.
The contrary is overfitting.
So here we have models with very high capacity.
Now these high capacity models try to model everything that they observe in the training
data.
And this may yield decision boundaries that are not very reasonable.
What we are actually interested in is a sensible boundary
that is a somehow a compromise between the observed data and th e actual ground truth representation.
So, we can analyze this problem by the so-called biased variance decomposition
and here we stick to regression where we have an ideal function H and this has some value and it's
typically associated with some measurement noise so there's some additional value
epsilon that is added to h of x and this then may be distributed normally with a zero mean
and a standard deviation of sigma epsilon. Now you can go ahead and use a model to estimate
h and this is now f hat that is estimated from some data set d. And we can now express the loss
for a single point as the expected value of the loss. And here this would then simply be the
L2 loss. So we take the true function minus the estimated function to the
power of 2 and compute the expected value to yield this loss. Interestingly
this loss can be shown to be decomposable into two parts. So there is
the bias and the bias is essentially the deviation of the expected value of our
model from the true model. So this essentially measures how far we are off.
The other part can be explained by the limited size of the data set.
So we can always try to find a model that is very flexible, and tries to
reduce this bias. What we buy-in, what we get as a result, is an
increase in variance. So the variance is the expected value of Y hat minus the
current value of Y head to the power of 2, and there of the expected value.
Presenters
Zugänglich über
Offener Zugang
Dauer
00:11:09 Min
Aufnahmedatum
2020-05-07
Hochgeladen am
2020-05-07 01:16:32
Sprache
en-US
Deep Learning - Regularization Part 1
This video discusses the problem of over- and underfitting. In order to get a better understanding, we explore the bias-variance trade-off and look into the effects of training data size and number of parameters.
Video References:
Lex Fridman's Channel
Further Reading:
A gentle Introduction to Deep Learning