6 - Deep Learning [ID:12345]
50 von 808 angezeigt

Welcome back to our lecture on deep learning. Today we will talk about common practices.

So we will have a short recap then we will talk about training strategies, optimization and learning rate,

architecture selection and hyperparameter optimization, ensembling which is a very common technique to boost the performance a little more,

we will talk about class imbalance and finally about evaluation.

So let's have a short recap. So far we essentially talked about all the nuts and bolts, how to train a network,

so we looked into fully convolutional, fully connected and convolutional layers, we looked at the activation functions,

we've seen that there's different choices and how the activation functions can introduce problems,

then we also looked into regularization, you remember the rectified linear unit always produces something positive,

then we looked into things like batch normalization to counter this effect, we looked into various loss functions,

we've seen you need particular loss functions for regression tasks, for regression you want to go with an L2 loss,

for classification we can go with a cross entropy loss for example, note those two loss functions are also very important

if you want to prepare for the oral exam and then we looked into optimization and how to go ahead with the different optimizers,

you didn't even realize that the microphone was going silent, so either I'm yelling loud enough or you're not listening,

so I don't want to know the answer. So let's continue. So this is important, we even have the microphone reconnected now

such that everybody can understand this, first things first, test data has to be separated from your data set

and you put it into a vault, you don't look into the test data, make sure that your test data set is representative,

so you want it to contain the modes of variation that you expect in an unseen data set,

make sure that for example you don't have the same persons in the training data set as in the test data set,

make it disjoint by persons, make it disjoint by any mode of variation that you expect in an unseen data set,

if you expect new scanners or new recording devices in your test case, then please also adjust the test data set accordingly,

this will be very important because otherwise if you train in conditions that you have already seen

and they are also in the test data set, you get biased results, same is true if you do your parameter optimization

using the test data set, you get optimal results, you get results that are too optimistic, so you don't want to do that,

the test data goes away in the very beginning before you start any training, you decide on the test data set and take it away,

very important and you only bring it out at the very end.

Good, then of course overfitting is extremely easy with neural networks,

there is for example this reference five that we have seen many times,

so even if you do random labels you can overfit onto ImageNet,

then of course the generalization error will be underestimated substantially if you use the test data set for model selection,

so if you do repeated tests with the same test data set, it can even happen although you never technically looked into the test data set,

you only used it for evaluation, but you are choosing a specific random initialization or specific parameter sets

that give good results on that particular test data set and you think you have them separated,

but only by repeatedly evaluating against the same data set you can get optimistic results, don't do that,

otherwise well you get, so in this case it is very easy to get very good results,

but they won't generalize, so it will break down as soon as you look into new data

and you don't want this happen because then you are not building a good system.

Then choosing the architecture is the first element in the model selection and you never do that with respect to your test set

because then you have to repeat all the other steps and you already looked into the test set at this point,

so you want to choose the architecture not using the test set

and what you should also do is all your initial experimentation you do on a smaller subset of the data,

because in the beginning you want to experiment a lot, you want to look into different architectures,

what may work or what may not work and you don't want to run this on terabytes of data

because this will cost you a lot of time, so you choose in the very beginning a very, very small data set

and do some experiments whether this can actually be learned.

So let's look into some training strategies and yeah, first thing you should do is

you check if the implementation is alright.

So for example if you implement your own layer or your own network implementation,

you can compare the analytic and the numerical gradient

because you want to make sure that there is no bug in the implementation.

I mean if you download some framework and it is already implemented, you should still be careful.

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:19:38 Min

Aufnahmedatum

2019-11-26

Hochgeladen am

2019-11-26 20:09:03

Sprache

en-US

Tags

error breininger learning practices gradient performance function classification classifiers measures architecture regularization hyperparameters
Einbetten
Wordpress FAU Plugin
iFrame
Teilen