The following content has been provided by the University of Erlangen-Nürnberg.
OK, hello, good evening. This is the fourth lecture on machine learning. So first some
organizational points. In the end, in principle, you can take an exam. The question is how many
of you would like to get a grade for this lecture, roughly? OK, so I think this is so
many that I will replace the oral exams by an actual written exam. I hope that is OK with you.
We then still have to discuss when that will be. The other point is, on this Thursday,
there will be a kind of tutorial. So regardless of whether you have already done the homeworks or
you haven't done the homework, this Thursday here in this lecture hall at 6, Thomas Fersel will give
a tutorial covering all the different homeworks that we did. So maybe you want to have a look at
them again. And then you can discuss these things, including all the nitty-gritty details of the
programming. OK, so let's start. Last time, we finally made it to the mountaintop. So we really
understood, or at least I told you about back propagation, which is the algorithm that you use
in neural networks to train them. Once you know back propagation, basically, you can do anything
you want with neural networks. And so today, I want to do the following. I want to go through
back propagation once more in a slightly different way to tell you how it works, to remind you of
these things. And then we want to apply it. We want to apply it, for example, to compressing an
image that we present to the network instead of a function. OK, so what I'm going to do now is take
the large-scale overview. And I will not care about all the tiny indices, but rather you should get an
overview of what back propagation really does. So remember, what we wanted to do is to calculate
the gradient of the cost function with respect to some weight. All these network connections have
their weights. So this is what defines the network. And the cost function tells me what's the distance
between what the network should do and what the network currently actually does. And I want to
minimize this distance. And in order to do that, I have to calculate the gradient. So I want to
represent this pictorially, which I've tried to do here. This is my network with all the neurons
connected by the connections that have their associated weights. And at the final output neurons,
I placed a little symbol to denote the cost function. Because in order to calculate the cost
function, what you do is you take these output values of the network, you compare them to the
ideal correct output values for this particular input that you sent through the network. And then,
for example, you take this difference squared to get a number that is never below 0 and which
will decrease if you are doing better in training your network. So that then gives you the cost
function. The cost function is a single value. It's a scalar function. So now if I want to find out
how little changes in the cost function are connected to little changes in the weight,
I have to find my way through my network. And there will be a path through this network. Let
me do this. So first, when we take the actual derivative, we know what we get is this difference
between the output of the network and the ideal, the correct output that I would like to have. And
so what I will now do is I will go through this network. I will follow a single path. And I will
show you all the factors that occur when you calculate this derivative. And what I will also
do, because this is about memorizing things and not about looking at each and every single index,
what I will do is I will omit the indices. This is OK because in principle, the indices you could
deduce from this graphics, from this image. For example, here when I write down this factor in
principle, the y should come with an index, which stands for this particular neuron. If I had placed
this thick black line at the first neuron, then the y would carry a different index relating to the
first neuron. So that's the way to read this. OK, so this is the first factor, but it occurs only
once when I take the derivative of my cost function. And then you want to see how any change in the
input to this neuron affects the output value of the neuron. And that we already know that is taking
the derivative of this nonlinear function. So this neuron gets as its input the weighted sum of several
neurons in the layer below, and it spits out a nonlinear function f of z. So if I'm only interested
in the tiny changes, I have to look at the derivative. And then I proceed. So in calculating this
derivative, we have seen that any connection contributes with the weight of this connection.
And then here it's the same game as before again, so I would have to calculate the derivative of f
Presenters
Zugänglich über
Offener Zugang
Dauer
01:15:02 Min
Aufnahmedatum
2017-05-29
Hochgeladen am
2017-05-30 14:09:25
Sprache
en-US