3 - Deep Learning [ID:12147]
50 von 749 angezeigt

Good morning everyone. Dial down the talks, just a very warm welcome. As you can easily

see I'm not Professor Meyer. I'm here today to replace him and at the moment it looks

like he will be back next week. So I would like to try out something, so I would like

to try out to have a five minute break in between just to get everyone a little bit

relaxed because the material here is quite dense. We can see whether this works for you

or not but just as a suggestion you're also welcome to ask questions in these five minutes

in between. So probably roughly around 8.15 or so we will have a very short break just

to make sure your attention is still focused. So the materials, oh sorry 9.15, thank you.

The materials that we will talk today are very relevant for the exam or so I've heard.

So this is really essential stuff that you should get yourself familiar with. And also

if there are issues with the materials, if I'm not explaining something right or you

just don't get it in the first run, please make sure that you ask questions. So again

just to emphasize this is very important material. It's kind of the basis of deep learning and

this is, these are tools that basically every machine learning researcher should be familiar

with. So the problem with machine learning is very often that you kind of get around

and get pretty good results even without knowing these basics but you get much better and you

get a much better insight. You get much better models that generalize a lot better in the

wild than after you've trained your model if you know these basics and if you know how

your data looks like. So just to shortly recap what Professor Meyer talked about at the end

of last week's lecture. We were talking about the categorical distribution. So, oh the mouse

is not working. So we will, we assume that our basically ground truth labels kind of

follow a probability distribution given by P. So we try to estimate the likelihood of

seeing a certain observation. In this case we see a certain observation of a coin flip

and we try to estimate the probability or the likelihood. So I'm going to be a little

bit vague here because I'm just intermixing these two. We're usually talking about the

likelihood. We're trying to estimate here the probability and later the likelihood of

seeing this observation. Now in our deep learning setting what we actually have is a score of

y hat that we transfer to a probabilistic setting. So you can imagine that we have a

network that outputs a probability distribution in general and depending on a certain input

we get a probability vector more or less. And this we enforce the properties of a probability

distribution by using the softmax function. So at the moment you can assume that your

vector, that all entries in the vector are between 0 and 1 and that they sum up to 1

in total. This is what we are enforcing with the softmax function. Now basically what this

means if we assume that our labels are categorically distributed so that they follow this probability

distribution and that these labels here, the y's without the hat, are either 0 or 1 is

that we basically get the probabilities of seeing a certain output basically or a certain

event with a certain setting given by our network. So this is kind of what we model

here. We assume that this follows a categorical distribution. Now as before what you did with

the regression setting where you try to kind of derive the maximum likelihood estimator

for a regression based on a model with Gaussian noise, you kind of do something similar here.

So we generate or we put the log likelihood, the negative log likelihood here that we describe

with this L of W. So in our neural network the only thing that we can really adapt are

the weights. Everything else is fixed. So we have a fixed training set that we train

our network on that gives us a certain amount of outputs. We have our ground truth labels

that we can see here. So the whole data set is M and we of course look at all the samples

in our data set and what we can really adapt are these parameters W. And what we want to

do is we want to maximize the likelihood of seeing our ground truth labels given that

we have the probabilities that our network generates. So basically we want to maximize

the likelihood of seeing our training set or seeing the ground truth labels based on

our training set. So this is kind of what this expresses here because we don't like

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:23:24 Min

Aufnahmedatum

2019-11-05

Hochgeladen am

2019-11-05 20:09:02

Sprache

en-US

Tags

batch local optimization learning entropy functions subgradients gradient function descent problems iteration rate vesal convex
Einbetten
Wordpress FAU Plugin
iFrame
Teilen