So welcome everybody. Welcome back to deep learning and today we want to talk about common
practices. So these are essentially some guidelines and some tips and tricks what you can do in
order to stabilize the training and the hints in this set of slides will be really useful
when you are trying to train a network from scratch. So let's have a first recap, then
let's look at training strategies, then we look into optimization and learning rate,
architecture selection, hyperparameter optimization, as well as a strategy called ensembling. Then
another problem that typically occurs or quite often can occur is class imbalance, we also
look into that, and then in the end the evaluation, which is also a very important point. So let's
recap, so far we've seen how to train a network, so we know essentially fully connected convolutional
layers, we know the activation functions, we know the loss functions that we use during
the training phase, then we've seen different optimization strategies, we're essentially
doing a gradient descent optimizer, but you know you can use momentum and other strategies
to stabilize this, and then in the last lecture we also looked into regularization and some
techniques to ensure that your network is actually going into the right direction, you're
not learning correlated feature maps and things like that. So today we will see how to choose
architectures, train and evaluate a deep neural network. And first things first, the test
data. So first thing you do when you try training a network, the very first thing that you do
is you put your test data in a vault, in a safe, so you take away the test data. We will
later work with validation and training data, but throughout the entire choosing of the
architecture, the fine tuning and so on, you keep away the test data, make that sure. Don't
experiment with your test data during the design of your network and during the parameter
optimization phase, because if you do that you get very optimistic results. And don't
put your test data into the training set, because if you do that you will simply learn
the test data and you will be able to predict it as well. So don't do that. That's very,
very important. And also be careful how you select your test data. So test data should
be a representative view of your data, and also make sure, let's say if you have a problem
where you do speech recognition for example, you want to make sure that if you train with
60 persons, that in your test data set, if you really aim for building a speaker independent
system, then you should not have the same persons in training and test. So they should
be disjoint, because if you do that you again get optimistic results. So you will get accuracies
that are just higher than you will then later have in the use case. So selecting the test
data and keep it in a vault. And already when you select the test data at the very beginning,
you have to be, you have to already have considered how you want to evaluate your classifier or
your system in the end. So that is already important when you choose the test data. Typically
you do splits. You can take some 50% or 60% for training and share the remaining data
for validation and test. Okay, so take the test data away and keep it in a vault. And
we will only look at this data again when we are evaluating the actual system. So obviously
one reason why you have to do this is overfitting is extremely easy with neural networks and
you can see there is even papers around that try to work with ImageNet, so a really large
database with random labels and they could show that they even can learn random labels,
which are supposedly meaningless. So you should not be able to predict random labels because
they are without any value. So you just randomly assign a new label, but if you don't do that
correctly you will also be able to predict random labels, which means that your classifier
or your system is essentially just overfit and any interpretation of the result is meaningless.
Okay, so if you don't do that you would substantially underestimate the error and obviously also
when you choose the architecture you don't want to do that on your test data. Don't do
it on the test data. And another thing that is very useful is if you start experimenting
with different architecture, different parameter sets, debugging and so on, do it on a smaller
subset of the data. Don't do it with the full data set. If you need a week to train on the
full data set and then you realize you have parameter x, y not set correctly, you will
Presenters
Zugänglich über
Offener Zugang
Dauer
00:56:56 Min
Aufnahmedatum
2018-05-16
Hochgeladen am
2018-05-16 15:49:05
Sprache
en-US