Okay, hello, good evening.
I just want to repeat briefly what we did up to now in these lectures.
So first we defined how a neural network looks like.
For us the neural network is just a very complicated function with many, many parameters that I
will be able to adapt and the purpose is to adapt these parameters so that I approximate
as well as possible a given function that maps the input to the output.
And the way to achieve this is you define some kind of cost function that measures the
deviation between the desired outputs and the true output of the network and then you
try to minimize this cost function by adapting all the weights, by tuning all the weights
and the way to do this is simply gradient descent.
So you try to change the parameters of the network in just the right way so as to minimize
the cost function and that is known as stochastic gradient descent simply because in principle
the cost function depends on all possible training examples that you could ever think
of but of course in practice you already want to do a training step after seeing only a
handful of training examples and so this is only a stochastic sample of the true cost
function.
Okay, then the question that immediately arises is well if we want to go this way and we want
to do gradient descent how do we calculate efficiently the derivatives that we need.
So the derivatives of the output of the neural network with respect to all its parameters
and remember a neural network can have thousands of neurons and it can have even millions of
parameters.
These are the weights, the connections that connect the neurons and so to take a gradient
with respect to so many variables that needs to be done efficiently and it turns out that
it's not that hard.
The resulting technique is known as back propagation but we learned last time that what you apply
is basically high school mathematics so you start by applying the chain rule repeatedly
and then you realize okay it contains steps that can be interpreted as a kind of matrix
vector multiplication so that is no longer high school mathematics but a little bit beyond
and so the way to interpret is that you start from the top of the network and slowly proceed
down layer by layer and in doing so you establish all these gradients.
So this is what we discussed last time and I'd say that at this point we already have
gone through both the most important and the most difficult part of these lectures.
To remind you of how this works here again is a graphic that I already displayed last
time trying to illustrate back propagation, this is our network, each circle is a neuron,
it carries a value at each moment in time, these lowest neurons form the input layer,
you write in the values that correspond to your input sample for example the pixel values
of an image and then you go as you evaluate the network you would go from one layer to
the next layer calculate linear superpositions and apply nonlinear functions do the same
again in the next step going layer by layer to the final output layers.
Now this C here represents the cost function C depends on the output and it compares the
output of the neural network against the true output that you would want to have so C is
larger when things are not quite good and so what you want to do is to calculate the
derivative of this cost function with respect to all the parameters which remember are all
these connection strengths or the weights.
Now if for example you wanted to calculate the derivative with respect to one of the
weights that is related to one of the connections down here between the input layer and the
first layer then the chain of connections that goes from the output layer down to this
one is pretty long and so that is reflected in the fact that if you now want to calculate
this gradient you have to apply many steps of the chain rule to actually go from the
output down there to where you actually want to change the weight and so let's see how
Presenters
Zugänglich über
Offener Zugang
Dauer
01:30:13 Min
Aufnahmedatum
2019-05-13
Hochgeladen am
2019-05-14 09:49:02
Sprache
en-US
This is a course introducing modern techniques of machine learning, especially deep neural networks, to an audience of physicists. Neural networks can be trained to perform diverse challenging tasks, including image recognition and natural language processing, just by training them on many examples. Neural networks have recently achieved spectacular successes, with their performance often surpassing humans. They are now also being considered more and more for applications in physics, ranging from predictions of material properties to analyzing phase transitions. We will cover the basics of neural networks, convolutional networks, autoencoders, restricted Boltzmann machines, and recurrent neural networks, as well as the recently emerging applications in physics.