Welcome everybody to today's deep learning lecture.
Today we want to talk a bit about common practices, the stuff that you need to know to get everything
implemented in practice.
And there is very little theory behind the best solutions.
So I have a small outline over in the next couple of videos and the topics we will look
at.
So we will think about the problems that we currently have and how far we went.
Then we talk about training strategies, again optimization and learning rate and a couple
of tricks how to adjust them.
Architecture selection and hyperparameter optimization.
One trick that is really useful is ensembling and typically people have to deal with class
imbalance and of course there is also very interesting approaches how to deal with them.
So finally we look into the evaluation and how to get a good predictor how well our network
is actually performing.
So far we have seen all the nuts and bolts of how to train the network.
We have the fully connected convolutional layers, we have the activation function, the
loss function, optimization, regularization.
And today we will talk about how to choose the architecture, train and evaluate a deep
neural network.
And the very first thing is test data.
Test data goes into the vault.
Ideally the test set should be kept in a vault and be brought only out at the end of the
data analysis as Hasty and colleagues are teaching in the elements of statistical learning.
So first things first, overfitting is extremely easy with neural networks.
Again ImageNet random labels.
So true test set error and generalization can be underestimated substantially when you
use the test set for model selection.
Not a good idea.
Because it doesn't work.
So when we choose the architecture that's typically the first element in the model selection.
And this should never be done on the test set.
So we can do initial experimentation on a smaller subset of the data, try to figure
out what works, but never work on the test set when you're doing these things of selecting
the architecture.
There's no notion of evil in that context other than the fact that people die.
Okay, so let's look at a couple of training strategies.
Before the training check your gradients.
Check the loss function.
Check own layer implementations that they compute correctly.
And if you implemented your own layer, then compare the analytic and the numerical gradient.
You can use the center differences for the numeric gradient.
Then you can use relative errors instead of absolute differences and consider the numerics.
Use double precision for checking.
Temporally scale the loss function if you observe very small values and choose your
age for the step size appropriately.
Then we have a couple of additional recommendations.
If you only use a few data points, then you will have less issues with non-differentiable
parts of the loss function.
You can train the network for a short period of time and only then perform the gradient
checks.
Presenters
Zugänglich über
Offener Zugang
Dauer
00:11:16 Min
Aufnahmedatum
2020-05-15
Hochgeladen am
2020-05-16 00:26:24
Sprache
en-US
Deep Learning - Common Practices Part 1
This video discusses the use of validation data and how to choose optimizers, monitor weights, and set learning rates including their annealing.
Video References:
Lex Fridman's Channel
Further Reading:
A gentle Introduction to Deep Learning