3 - Machine Learning for Physicists [ID:11243]

50 von 817 angezeigt

Okay, hello, good evening.

I just want to repeat briefly what we did up to now in these lectures.

So first we defined how a neural network looks like.

For us the neural network is just a very complicated function with many, many parameters that I

will be able to adapt and the purpose is to adapt these parameters so that I approximate

as well as possible a given function that maps the input to the output.

And the way to achieve this is you define some kind of cost function that measures the

deviation between the desired outputs and the true output of the network and then you

try to minimize this cost function by adapting all the weights, by tuning all the weights

and the way to do this is simply gradient descent.

So you try to change the parameters of the network in just the right way so as to minimize

the cost function and that is known as stochastic gradient descent simply because in principle

the cost function depends on all possible training examples that you could ever think

of but of course in practice you already want to do a training step after seeing only a

handful of training examples and so this is only a stochastic sample of the true cost

function.

Okay, then the question that immediately arises is well if we want to go this way and we want

to do gradient descent how do we calculate efficiently the derivatives that we need.

So the derivatives of the output of the neural network with respect to all its parameters

and remember a neural network can have thousands of neurons and it can have even millions of

parameters.

These are the weights, the connections that connect the neurons and so to take a gradient

with respect to so many variables that needs to be done efficiently and it turns out that

it's not that hard.

The resulting technique is known as back propagation but we learned last time that what you apply

is basically high school mathematics so you start by applying the chain rule repeatedly

and then you realize okay it contains steps that can be interpreted as a kind of matrix

vector multiplication so that is no longer high school mathematics but a little bit beyond

and so the way to interpret is that you start from the top of the network and slowly proceed

down layer by layer and in doing so you establish all these gradients.

So this is what we discussed last time and I'd say that at this point we already have

gone through both the most important and the most difficult part of these lectures.

To remind you of how this works here again is a graphic that I already displayed last

time trying to illustrate back propagation, this is our network, each circle is a neuron,

it carries a value at each moment in time, these lowest neurons form the input layer,

you write in the values that correspond to your input sample for example the pixel values

of an image and then you go as you evaluate the network you would go from one layer to

the next layer calculate linear superpositions and apply nonlinear functions do the same

again in the next step going layer by layer to the final output layers.

Now this C here represents the cost function C depends on the output and it compares the

output of the neural network against the true output that you would want to have so C is

larger when things are not quite good and so what you want to do is to calculate the

derivative of this cost function with respect to all the parameters which remember are all

these connection strengths or the weights.

Now if for example you wanted to calculate the derivative with respect to one of the

weights that is related to one of the connections down here between the input layer and the

first layer then the chain of connections that goes from the output layer down to this

one is pretty long and so that is reflected in the fact that if you now want to calculate

this gradient you have to apply many steps of the chain rule to actually go from the

output down there to where you actually want to change the weight and so let's see how

Teil einer Videoserie :

Machine Learning for Physicists

Presenters

Prof. Dr. Florian Marquardt

Zugänglich über

Offener Zugang

Dauer

01:30:13 Min

Aufnahmedatum

2019-05-13

Hochgeladen am

2019-05-14 09:49:02

Sprache

en-US

This is a course introducing modern techniques of machine learning, especially deep neural networks, to an audience of physicists. Neural networks can be trained to perform diverse challenging tasks, including image recognition and natural language processing, just by training them on many examples. Neural networks have recently achieved spectacular successes, with their performance often surpassing humans. They are now also being considered more and more for applications in physics, ranging from predictions of material properties to analyzing phase transitions. We will cover the basics of neural networks, convolutional networks, autoencoders, restricted Boltzmann machines, and recurrent neural networks, as well as the recently emerging applications in physics.

Tags

Per RSS abonnieren