22 - Deep Learning - Common Practices Part 1 [ID:15966]

50 von 150 angezeigt

Welcome everybody to today's deep learning lecture.

Today we want to talk a bit about common practices, the stuff that you need to know to get everything

implemented in practice.

And there is very little theory behind the best solutions.

So I have a small outline over in the next couple of videos and the topics we will look

at.

So we will think about the problems that we currently have and how far we went.

Then we talk about training strategies, again optimization and learning rate and a couple

of tricks how to adjust them.

Architecture selection and hyperparameter optimization.

One trick that is really useful is ensembling and typically people have to deal with class

imbalance and of course there is also very interesting approaches how to deal with them.

So finally we look into the evaluation and how to get a good predictor how well our network

is actually performing.

So far we have seen all the nuts and bolts of how to train the network.

We have the fully connected convolutional layers, we have the activation function, the

loss function, optimization, regularization.

And today we will talk about how to choose the architecture, train and evaluate a deep

neural network.

And the very first thing is test data.

Test data goes into the vault.

Ideally the test set should be kept in a vault and be brought only out at the end of the

data analysis as Hasty and colleagues are teaching in the elements of statistical learning.

So first things first, overfitting is extremely easy with neural networks.

Again ImageNet random labels.

So true test set error and generalization can be underestimated substantially when you

use the test set for model selection.

Not a good idea.

Because it doesn't work.

So when we choose the architecture that's typically the first element in the model selection.

And this should never be done on the test set.

So we can do initial experimentation on a smaller subset of the data, try to figure

out what works, but never work on the test set when you're doing these things of selecting

the architecture.

There's no notion of evil in that context other than the fact that people die.

Okay, so let's look at a couple of training strategies.

Before the training check your gradients.

Check the loss function.

Check own layer implementations that they compute correctly.

And if you implemented your own layer, then compare the analytic and the numerical gradient.

You can use the center differences for the numeric gradient.

Then you can use relative errors instead of absolute differences and consider the numerics.

Use double precision for checking.

Temporally scale the loss function if you observe very small values and choose your

age for the step size appropriately.

Then we have a couple of additional recommendations.

If you only use a few data points, then you will have less issues with non-differentiable

parts of the loss function.

You can train the network for a short period of time and only then perform the gradient

checks.

Teil einer Videoserie :

Deep Learning

Presenters

Prof. Dr.-Ing. Andreas Maier

Zugänglich über

Offener Zugang

Dauer

00:11:16 Min

Aufnahmedatum

2020-05-15

Hochgeladen am

2020-05-16 00:26:24

Sprache

en-US

Deep Learning - Common Practices Part 1

This video discusses the use of validation data and how to choose optimizers, monitor weights, and set learning rates including their annealing.

Video References:
Lex Fridman's Channel

Further Reading:
A gentle Introduction to Deep Learning

Tags

Per RSS abonnieren