18 - Deep Learning - Plain Version 2020 [ID:21113]
50 von 137 angezeigt

Welcome back to deep learning. We want to continue our analysis of regularization methods

and today I want to talk about classical techniques. So here is a typical example

of a loss curve over the iterations on the training set. What I want to show you

here on the right hand side is the loss curve on the test set. You see that

although the training loss goes down the test loss goes up. So at some point the

training data set is overfitted and it doesn't produce a model that is

representative of the data anymore. By the way always keep in mind that the test

set must never be used for training. If you trained on your test set then you

will always get very good results but it's likely to be a complete overestimate

of the performance. So there is this typical situation that somebody runs

into my office and says yes I have 99% recognition rate. So the first thing that

somebody in pattern recognition or machine learning does when he reads 99%

recognition rate is ask did you train on your test data and this is the very first

thing that you should make sure that has not happened. When you did some stupid

mistake and there's some data set pointer that was not pointing into the

right data set then suddenly your recognition rate jumps up. So be careful if

you have very good results. Always scrutinize that they are really

appropriate and they are really general. So if you want to produce curves like

the ones that I'm showing to you here you may want to use a validation set

that you take off the training data set. You never use the test set in training

and if you would do that you would just get a complete overestimate of

performance and a complete overfit. So if you use a validation set then we can

already use the first trick. If you use it we observe at what point we have the

minimum error in the validation set. If we are at this point we can use it as a

stopping criterion and use that model for our test evaluation. So it's a common

technique to use the parameters with the minimum validation results. Another very

useful technique is data augmentation. So the idea here is to artificially

enlarge the data set. There are transformations on the label which

should be invariant to the class. So let's say you have the image of a cat

and you rotate it by 90 degrees it still shows a cat. Obviously those

augmentation techniques have to be done carefully. So in the right hand example

you can see that a rotation by 180 degrees is probably not a good way of

augmenting numbers because it may switch the label. So there are very common

transformations here random spatial transforms or elastic transforms then

there are pixel transformations like changing the resolution changing the

noise or changing pixel distributions like color brightness and so on. So there

are typical augmentation techniques in image processing. Well what else? We can

regularize the loss functions. Here we can see that this is essentially the so

called maximum a posteriori estimation. We can do this in the Bayesian approach

where we want to consider the uncertain weights W. They follow a prior

distribution P of W. If you have some data set X with some associated labels

Y we can see that the joint probability P of W XY is the probability of P W

given Y and X times the probability of Y and X. We can reformulate that into the

probability P of Y given X and W times the probability of P of X and W. From

these equalities we can derive the Bayes theorem that the conditional probability

P of W given Y X can be expressed as the probability P of Y given X W times the

probability P of X and W divided by the probability of P of Y and X. So we can

rearrange this a bit further and here you can see then that the probability P

of X and the probability P of Y given X pop up. By removing the terms that are

independent of W this yields a map estimate. So we can actually seek to

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:13:46 Min

Aufnahmedatum

2020-10-12

Hochgeladen am

2020-10-12 14:46:29

Sprache

en-US

Deep Learning - Regularization Part 2

This video discusses classical regularization techniques such as early stopping using a validation set, augmentation, and maximum a-posteriori methods that expand the loss function using a regularization term.

For reminders to watch the new video follow on Twitter or LinkedIn.

Further Reading:
A gentle Introduction to Deep Learning

Einbetten
Wordpress FAU Plugin
iFrame
Teilen