Good morning everyone. Dial down the talks, just a very warm welcome. As you can easily
see I'm not Professor Meyer. I'm here today to replace him and at the moment it looks
like he will be back next week. So I would like to try out something, so I would like
to try out to have a five minute break in between just to get everyone a little bit
relaxed because the material here is quite dense. We can see whether this works for you
or not but just as a suggestion you're also welcome to ask questions in these five minutes
in between. So probably roughly around 8.15 or so we will have a very short break just
to make sure your attention is still focused. So the materials, oh sorry 9.15, thank you.
The materials that we will talk today are very relevant for the exam or so I've heard.
So this is really essential stuff that you should get yourself familiar with. And also
if there are issues with the materials, if I'm not explaining something right or you
just don't get it in the first run, please make sure that you ask questions. So again
just to emphasize this is very important material. It's kind of the basis of deep learning and
this is, these are tools that basically every machine learning researcher should be familiar
with. So the problem with machine learning is very often that you kind of get around
and get pretty good results even without knowing these basics but you get much better and you
get a much better insight. You get much better models that generalize a lot better in the
wild than after you've trained your model if you know these basics and if you know how
your data looks like. So just to shortly recap what Professor Meyer talked about at the end
of last week's lecture. We were talking about the categorical distribution. So, oh the mouse
is not working. So we will, we assume that our basically ground truth labels kind of
follow a probability distribution given by P. So we try to estimate the likelihood of
seeing a certain observation. In this case we see a certain observation of a coin flip
and we try to estimate the probability or the likelihood. So I'm going to be a little
bit vague here because I'm just intermixing these two. We're usually talking about the
likelihood. We're trying to estimate here the probability and later the likelihood of
seeing this observation. Now in our deep learning setting what we actually have is a score of
y hat that we transfer to a probabilistic setting. So you can imagine that we have a
network that outputs a probability distribution in general and depending on a certain input
we get a probability vector more or less. And this we enforce the properties of a probability
distribution by using the softmax function. So at the moment you can assume that your
vector, that all entries in the vector are between 0 and 1 and that they sum up to 1
in total. This is what we are enforcing with the softmax function. Now basically what this
means if we assume that our labels are categorically distributed so that they follow this probability
distribution and that these labels here, the y's without the hat, are either 0 or 1 is
that we basically get the probabilities of seeing a certain output basically or a certain
event with a certain setting given by our network. So this is kind of what we model
here. We assume that this follows a categorical distribution. Now as before what you did with
the regression setting where you try to kind of derive the maximum likelihood estimator
for a regression based on a model with Gaussian noise, you kind of do something similar here.
So we generate or we put the log likelihood, the negative log likelihood here that we describe
with this L of W. So in our neural network the only thing that we can really adapt are
the weights. Everything else is fixed. So we have a fixed training set that we train
our network on that gives us a certain amount of outputs. We have our ground truth labels
that we can see here. So the whole data set is M and we of course look at all the samples
in our data set and what we can really adapt are these parameters W. And what we want to
do is we want to maximize the likelihood of seeing our ground truth labels given that
we have the probabilities that our network generates. So basically we want to maximize
the likelihood of seeing our training set or seeing the ground truth labels based on
our training set. So this is kind of what this expresses here because we don't like
Presenters
Zugänglich über
Offener Zugang
Dauer
01:23:24 Min
Aufnahmedatum
2019-11-05
Hochgeladen am
2019-11-05 20:09:02
Sprache
en-US