6 - Deep Learning [ID:11574]
50 von 690 angezeigt

Excellent. So great to see all of you at this early moment. Today we have two topics that

we want to talk about in this lecture. One will be common practices and the other one

I want to talk about is efficient architecture, so good architectures for training. So let's

see. I have to rush a little bit, but I hope that we can still go through both of the topics.

Okay, so let's start with the common practices. So I want to do a short recap of what we've

been discussing so far, do some training strategies and then in particular how to optimize and

how to set learning rates and things like that. Architecture selection, hyperparameter

optimization, ensembling and class imbalance. And yeah, so let's recap what we did. So far

we essentially talked about all the nuts and bolts how to train a network. We have these

gradient descent procedure, fully connected and convolutional layers, division functions,

loss function, some optimization, some regularization and today we want to talk about how to choose

architecture, train and evaluate a deep neural network. And the first thing, one of the most

important things is your test data is the last thing you look at. Test data, you determine

your test data, you make it, you sample it in the same way as you sampled the validation

and the training data and then you put it into your vault. You put it in a safe place

when you never look at it and you only look at the test data at the very end. Because

if you do otherwise you get very optimistic results and this is one of the most common

mistakes that people do is they look too often into the test data set and then they implicitly

create a model that is over fitting even on the test data set. If you just do enough experiments

by chance you can determine a factor that although the test data is not really looked

at it predicts what is going to be in the test data set. Okay, so yeah, if you really

want to get through test set generalization error you only can do that if you keep everything

disjoint and if you want to figure out the parts and pieces of your network, if you want

to debug, if you want to check whether your gradient is going correctly, direction, then

you do the initial experimentation on a smaller subset of the data set. So if you have a million

training images and you want to test whether you implemented this gradient correctly, yeah

you don't want to do that with the whole million training data sets. So you only do these things

on very small subsets of the test set, of the training set, sorry, test set very last.

Some training strategies, okay, so if you implemented something anew you use the centered

differences for the numeric gradient to check that your gradient points into the right direction.

If you didn't check your implementation in particular with the gradients things can really

go wrong and you want to use that for debugging for example. Then you can also use the relative

error instead of absolute differences. In numerics you want to use double precision

for checking, yeah, just to make sure that everything is done and accurately and then

you can also scale the loss function if you observe very small values. Then obviously

you want to choose this step size age for the numerical gradient appropriately that

it lies within your numerical precision. Then some a few additional recommendations, use

only a few data points, then you have less issues with non-differential parts of the

loss function, train the network for a short period of time and perform checks where you

really have a reduction in a loss function and so on. Then check the gradient first without

and then with regularization terms, if you add regularization terms they can already

make a difference and you only want to debug your gradient in the first place and you also

want to turn off data augmentation and dropout for these gradient checks. So then you check

the initialization at the loss, you check random initialization of the layers and then

you compute the loss for each class on the untrained network with regularization turned

off. Then you compare the loss when you choose the class randomly, yeah, so your untrained

network should essentially produce random predictions and then you repeat that with

multiple random initializations. Then you can start doing the training and in the training

you want to check whether the architecture is in general capable to learn the task. So

before you go to the full training data set you take a small subset, maybe 20 samples,

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:16:54 Min

Aufnahmedatum

2019-06-06

Hochgeladen am

2019-06-06 19:09:19

Sprache

en-US

Deep Learning (DL) has attracted much interest in a wide range of applications such as image recognition, speech recognition and artificial intelligence, both from academia and industry. This lecture introduces the core elements of neural networks and deep learning, it comprises:

  • (multilayer) perceptron, backpropagation, fully connected neural networks

  • loss functions and optimization strategies

  • convolutional neural networks (CNNs)

  • activation functions

  • regularization strategies

  • common practices for training and evaluating neural networks

  • visualization of networks and results

  • common architectures, such as LeNet, Alexnet, VGG, GoogleNet

  • recurrent neural networks (RNN, TBPTT, LSTM, GRU)

  • deep reinforcement learning

  • unsupervised learning (autoencoder, RBM, DBM, VAE)

  • generative adversarial networks (GANs)

  • weakly supervised learning

  • applications of deep learning (segmentation, object detection, speech recognition, ...)

Tags

architectures layers data classification architects
Einbetten
Wordpress FAU Plugin
iFrame
Teilen