5 - Deep Learning [ID:9115]
50 von 565 angezeigt

Okay, let's start. Welcome everyone to today's lecture in deep learning. I'm filling in for

Professor Meier today. My name is Katharina and today we will talk about regularization.

So far you've heard about the motivation for neural networks, the motivation more specifically

for CNNs, what is a loss function, what are the different parts of a network including

convolutional layers, fully connected layers, activation functions, cooling layers, etc.

and how to optimize the network with respect to a certain loss function. So we should be

more or less set and done to start training our network given a huge amount of training

data. However, generally in a real world setting we only have a very limited amount of training

data and we need to prevent powerful networks from overfitting to our training data such

that they generalize well to unobserved data. So today we will talk about the theoretical

background behind regularization. So how can we prevent our network from actually overfitting?

We will get into classical techniques, talk about normalization and strategies that are

more targeted to networks including dropout and initialization and then go to more advanced

topics that, for example, transfer learning and multitask learning. So let's start with

an introduction to regularization. So assume that we have two classes and samples from

two classes and in this case we can actually see that they are pretty well separated and

pretty well separable. So you can think of a number of decision boundaries that separate

these two classes. However, generally our data doesn't look as nice. Instead it looks

more something like this. For example, because of sensor noise, measurement noise, because

of mislabeled data or because the data is ambiguous in a certain sense. Remember that

these ImageNet examples where we have ambiguous or even wrong labels. So we still want to

find a sensible decision boundary between those two. Now, depending on the model that

we choose, we can, for example, observe underfitting. So our model is not strong enough to separate

the two classes in a sensible way. Instead it finds a decision boundary that is too simple

to separate them. On the other end of the spectrum we can also have a very powerful

model that can adapt to our training data very well. And in this case this is represented

by this wiggly line that you see here. You should remember from pattern recognition that

in this case we don't expect our model to generalise very well to unobserved data and

that this wiggly line will look quite different depending on the training samples that we

actually have. So this is what we call overfitting. The model to our training data. So what we

are actually looking for is a sensible boundary that discriminates the two classes. It shouldn't

be too adapted to the training data but it shouldn't underfit our underlying distribution.

So to have a look at this from a more theoretical perspective, we now go to a regression problem

where we have a true distribution and noise that is added to a model. So this is a model

where we have a true distribution and noise that is added to our underlying distribution.

Now we have a model that is estimated based on a data set and we can define the expected

loss as described here. So we have a mean squared loss and calculate the expectation

value over a number of data sets basically that we use for training. So some form of

instantiation of our model. So we can use the bias variance decomposition to decompose

this expectation value into three parts. On one hand we have the bias part, the variance

part and then on the other hand we have the irreducible error. The term irreducible error

kind of fits us in the way that we don't really have to take care of this anymore but we can

just ignore it in our additional observations of this. So we can't reduce it anyway so let's

not take care about it at the moment. Then the bias is quite interesting in the sense

that you can see here both the expectation value of our model, so how the bias is going

to be in our model. So you can see the expectation value of our model and the true distribution

that underlies it. So this kind of represents the difference that our model has, the mean

difference that our model has to our underlying distribution. And then in the variance we

don't care about the true distribution at all but we are just looking at the variance

of different model instantiations. So how much do different data sets, how much do models

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:04:36 Min

Aufnahmedatum

2018-05-09

Hochgeladen am

2018-05-09 16:09:04

Sprache

en-US

Tags

reconstruction energy deep spatial initialization tasks dimension transfer dataset capacity batch weights auxiliary task normalization pattern exercise problem loss recognition networks learning training data classification model parameters
Einbetten
Wordpress FAU Plugin
iFrame
Teilen