17 - Deep Learning - Regularization Part 1 [ID:15181]
50 von 153 angezeigt

onder

Welcome back to Deep Learning.

So, today we want to talk

about regularization

techniques and we'll start

with a short introduction to

regularization and

The general problems of

Pre deep learning textbook

that I told you, you need

you have no guarantee of convergence.

You know, all those things that you read in textbook

and they tell you stay away from this.

And they're all wrong.

So you can see here that we've a first start about the...

background. What is the problem of regularization to talk about

classical techniques, normalization

drop out, initialization, Transfer learning is a very common one.

Transfer learning, and it has been done in principle for many decades.

multitask learning. So, why are we talking about this topic so much? Well, if you

want to fit your data then problems like these ones there are easy to fit

because they have a clear solution, but typically you have the problem that your

data is noisy and you cannot easily separate them. So what you then run into

is the problem of underfitting. If you have a model that doesn't have a very

high capacity, then may have something that is lying here.

Which is not a very good fit to describe the separation of the classes.

The contrary is overfitting.

So here we have models with very high capacity.

Now these high capacity models try to model everything that they observe in the training

data.

And this may yield decision boundaries that are not very reasonable.

What we are actually interested in is a sensible boundary

that is a somehow a compromise between the observed data and th e actual ground truth representation.

So, we can analyze this problem by the so-called biased variance decomposition

and here we stick to regression where we have an ideal function H and this has some value and it's

typically associated with some measurement noise so there's some additional value

epsilon that is added to h of x and this then may be distributed normally with a zero mean

and a standard deviation of sigma epsilon. Now you can go ahead and use a model to estimate

h and this is now f hat that is estimated from some data set d. And we can now express the loss

for a single point as the expected value of the loss. And here this would then simply be the

L2 loss. So we take the true function minus the estimated function to the

power of 2 and compute the expected value to yield this loss. Interestingly

this loss can be shown to be decomposable into two parts. So there is

the bias and the bias is essentially the deviation of the expected value of our

model from the true model. So this essentially measures how far we are off.

The other part can be explained by the limited size of the data set.

So we can always try to find a model that is very flexible, and tries to

reduce this bias. What we buy-in, what we get as a result, is an

increase in variance. So the variance is the expected value of Y hat minus the

current value of Y head to the power of 2, and there of the expected value.

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:11:09 Min

Aufnahmedatum

2020-05-07

Hochgeladen am

2020-05-07 01:16:32

Sprache

en-US

Deep Learning - Regularization Part 1

This video discusses the problem of over- and underfitting. In order to get a better understanding, we explore the bias-variance trade-off and look into the effects of training data size and number of parameters.

Video References:
Lex Fridman's Channel

Further Reading:
A gentle Introduction to Deep Learning

Tags

introduction artificial intelligence deep learning machine learning pattern recognition overfitting bias-variance tradeoff
Einbetten
Wordpress FAU Plugin
iFrame
Teilen