Excellent. So great to see all of you at this early moment. Today we have two topics that
we want to talk about in this lecture. One will be common practices and the other one
I want to talk about is efficient architecture, so good architectures for training. So let's
see. I have to rush a little bit, but I hope that we can still go through both of the topics.
Okay, so let's start with the common practices. So I want to do a short recap of what we've
been discussing so far, do some training strategies and then in particular how to optimize and
how to set learning rates and things like that. Architecture selection, hyperparameter
optimization, ensembling and class imbalance. And yeah, so let's recap what we did. So far
we essentially talked about all the nuts and bolts how to train a network. We have these
gradient descent procedure, fully connected and convolutional layers, division functions,
loss function, some optimization, some regularization and today we want to talk about how to choose
architecture, train and evaluate a deep neural network. And the first thing, one of the most
important things is your test data is the last thing you look at. Test data, you determine
your test data, you make it, you sample it in the same way as you sampled the validation
and the training data and then you put it into your vault. You put it in a safe place
when you never look at it and you only look at the test data at the very end. Because
if you do otherwise you get very optimistic results and this is one of the most common
mistakes that people do is they look too often into the test data set and then they implicitly
create a model that is over fitting even on the test data set. If you just do enough experiments
by chance you can determine a factor that although the test data is not really looked
at it predicts what is going to be in the test data set. Okay, so yeah, if you really
want to get through test set generalization error you only can do that if you keep everything
disjoint and if you want to figure out the parts and pieces of your network, if you want
to debug, if you want to check whether your gradient is going correctly, direction, then
you do the initial experimentation on a smaller subset of the data set. So if you have a million
training images and you want to test whether you implemented this gradient correctly, yeah
you don't want to do that with the whole million training data sets. So you only do these things
on very small subsets of the test set, of the training set, sorry, test set very last.
Some training strategies, okay, so if you implemented something anew you use the centered
differences for the numeric gradient to check that your gradient points into the right direction.
If you didn't check your implementation in particular with the gradients things can really
go wrong and you want to use that for debugging for example. Then you can also use the relative
error instead of absolute differences. In numerics you want to use double precision
for checking, yeah, just to make sure that everything is done and accurately and then
you can also scale the loss function if you observe very small values. Then obviously
you want to choose this step size age for the numerical gradient appropriately that
it lies within your numerical precision. Then some a few additional recommendations, use
only a few data points, then you have less issues with non-differential parts of the
loss function, train the network for a short period of time and perform checks where you
really have a reduction in a loss function and so on. Then check the gradient first without
and then with regularization terms, if you add regularization terms they can already
make a difference and you only want to debug your gradient in the first place and you also
want to turn off data augmentation and dropout for these gradient checks. So then you check
the initialization at the loss, you check random initialization of the layers and then
you compute the loss for each class on the untrained network with regularization turned
off. Then you compare the loss when you choose the class randomly, yeah, so your untrained
network should essentially produce random predictions and then you repeat that with
multiple random initializations. Then you can start doing the training and in the training
you want to check whether the architecture is in general capable to learn the task. So
before you go to the full training data set you take a small subset, maybe 20 samples,
Presenters
Zugänglich über
Offener Zugang
Dauer
01:16:54 Min
Aufnahmedatum
2019-06-06
Hochgeladen am
2019-06-06 19:09:19
Sprache
en-US
Deep Learning (DL) has attracted much interest in a wide range of applications such as image recognition, speech recognition and artificial intelligence, both from academia and industry. This lecture introduces the core elements of neural networks and deep learning, it comprises:
-
(multilayer) perceptron, backpropagation, fully connected neural networks
-
loss functions and optimization strategies
-
convolutional neural networks (CNNs)
-
activation functions
-
regularization strategies
-
common practices for training and evaluating neural networks
-
visualization of networks and results
-
common architectures, such as LeNet, Alexnet, VGG, GoogleNet
-
recurrent neural networks (RNN, TBPTT, LSTM, GRU)
-
deep reinforcement learning
-
unsupervised learning (autoencoder, RBM, DBM, VAE)
-
generative adversarial networks (GANs)
-
weakly supervised learning
-
applications of deep learning (segmentation, object detection, speech recognition, ...)