6 - Deep Learning [ID:9141]
50 von 480 angezeigt

So welcome everybody. Welcome back to deep learning and today we want to talk about common

practices. So these are essentially some guidelines and some tips and tricks what you can do in

order to stabilize the training and the hints in this set of slides will be really useful

when you are trying to train a network from scratch. So let's have a first recap, then

let's look at training strategies, then we look into optimization and learning rate,

architecture selection, hyperparameter optimization, as well as a strategy called ensembling. Then

another problem that typically occurs or quite often can occur is class imbalance, we also

look into that, and then in the end the evaluation, which is also a very important point. So let's

recap, so far we've seen how to train a network, so we know essentially fully connected convolutional

layers, we know the activation functions, we know the loss functions that we use during

the training phase, then we've seen different optimization strategies, we're essentially

doing a gradient descent optimizer, but you know you can use momentum and other strategies

to stabilize this, and then in the last lecture we also looked into regularization and some

techniques to ensure that your network is actually going into the right direction, you're

not learning correlated feature maps and things like that. So today we will see how to choose

architectures, train and evaluate a deep neural network. And first things first, the test

data. So first thing you do when you try training a network, the very first thing that you do

is you put your test data in a vault, in a safe, so you take away the test data. We will

later work with validation and training data, but throughout the entire choosing of the

architecture, the fine tuning and so on, you keep away the test data, make that sure. Don't

experiment with your test data during the design of your network and during the parameter

optimization phase, because if you do that you get very optimistic results. And don't

put your test data into the training set, because if you do that you will simply learn

the test data and you will be able to predict it as well. So don't do that. That's very,

very important. And also be careful how you select your test data. So test data should

be a representative view of your data, and also make sure, let's say if you have a problem

where you do speech recognition for example, you want to make sure that if you train with

60 persons, that in your test data set, if you really aim for building a speaker independent

system, then you should not have the same persons in training and test. So they should

be disjoint, because if you do that you again get optimistic results. So you will get accuracies

that are just higher than you will then later have in the use case. So selecting the test

data and keep it in a vault. And already when you select the test data at the very beginning,

you have to be, you have to already have considered how you want to evaluate your classifier or

your system in the end. So that is already important when you choose the test data. Typically

you do splits. You can take some 50% or 60% for training and share the remaining data

for validation and test. Okay, so take the test data away and keep it in a vault. And

we will only look at this data again when we are evaluating the actual system. So obviously

one reason why you have to do this is overfitting is extremely easy with neural networks and

you can see there is even papers around that try to work with ImageNet, so a really large

database with random labels and they could show that they even can learn random labels,

which are supposedly meaningless. So you should not be able to predict random labels because

they are without any value. So you just randomly assign a new label, but if you don't do that

correctly you will also be able to predict random labels, which means that your classifier

or your system is essentially just overfit and any interpretation of the result is meaningless.

Okay, so if you don't do that you would substantially underestimate the error and obviously also

when you choose the architecture you don't want to do that on your test data. Don't do

it on the test data. And another thing that is very useful is if you start experimenting

with different architecture, different parameter sets, debugging and so on, do it on a smaller

subset of the data. Don't do it with the full data set. If you need a week to train on the

full data set and then you realize you have parameter x, y not set correctly, you will

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:56:56 Min

Aufnahmedatum

2018-05-16

Hochgeladen am

2018-05-16 15:49:05

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen