6 - Machine Learning for Physicists [ID:11537]
50 von 935 angezeigt

Okay, hello, good evening. Let me first recall what we did last time. We started discussing

recurrent neural networks. That means networks that have memory. So the situation is depicted

here, you would have as input to your network a time sequence. And so maybe at some point

in the time sequence there's a kind of signal, then for a long time there are no important

signals and suddenly there's another signal that now references back whatever the network

received in earlier time slots. So the network really needs a memory to reply to this challenge.

And a typical example would be if you tell the network some story which is a sequence

of words and then at some point you ask a question about the story and it has to reply

and then obviously it needs a memory. And so the solution are these recurrent neural

networks. You can imagine that here you have the input so each of these blue circles could

in the simplest case be just one value, one input neuron and then the blue circles labeled

output would be also just one neuron, one value. But in a more complicated setting each

of these blue circles could be a whole layer of neurons. And then what this depicts is

actually the progression in time and so the network really doesn't change. The network

would just be this combination in this simple case of input and output neuron or input and

output layer. But I draw it several times so at each new time step I again draw the

same network. And the reason I do this is because I want to be able to depict how the

network keeps memory. So how it keeps some of the values of an earlier time step influencing

the calculation of the values for the next time step. So here the slide just depicts

that each of these blue circles in reality can be a whole layer of neurons and of course

we can also play the same trick as before we can have hidden layers. So now there would

be the network would consist of an input layer, a hidden layer and an output layer and I draw

many many copies of the network corresponding to the different time steps and of course

for each different time step the input is different so that means all the values in

the whole network will be different even just for this reason alone. Plus also the memory

it receives from earlier times will of course be different and that also can make the output

of the network different. So the question is how learning proceeds and again and the

principle is the same as usual you will have the correct answer that you supply to the

network together with the input and then you want to make sure that it comes closer to

this correct answer. So here I depicted a situation where inputs are given during all

the time steps but the output is only read out at the very final time and then the correct

answer is known, the correct output is known maybe your true output of the neural network

deviates from this correct output so you calculate the deviation and you try to minimize this

deviation by taking the gradient and that means really what you want to do is you want

to adapt all the weights in the whole network in order to get the right output. And so when

you now do back propagation not only do you have to take gradients down towards the lower

layers in the network that was the case all the time already but also when you calculate

these gradients you have to go back in time because the output of earlier times influences

the final result. And so you have to try to adapt all these weights. Now there's a funny

thing here remember this is not I don't know 10 different networks somehow connected together

it's only a single network which I draw repeatedly for the different time steps so that means

the weights are the same for the time steps the weights do not change in time and that

means if you find out during your back propagation that you should change this weight a little

bit in some particular direction but here you should change the weight in some particular

different direction what you will end up doing is actually changing the weights that belong

to this particular arrow that connects the hidden layer and the output layer you will

change them according to the say average or the sum of all these proposed changes. Okay

so now there was a fundamental challenge here and that was the exploding gradients or vanishing

gradients problem. So that occurs not only for back propagation through time but in principle

also if you have a network with many many layers and then you want to back propagate

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:21:48 Min

Aufnahmedatum

2019-06-03

Hochgeladen am

2019-06-04 04:39:03

Sprache

en-US

This is a course introducing modern techniques of machine learning, especially deep neural networks, to an audience of physicists. Neural networks can be trained to perform diverse challenging tasks, including image recognition and natural language processing, just by training them on many examples. Neural networks have recently achieved spectacular successes, with their performance often surpassing humans. They are now also being considered more and more for applications in physics, ranging from predictions of material properties to analyzing phase transitions. We will cover the basics of neural networks, convolutional networks, autoencoders, restricted Boltzmann machines, and recurrent neural networks, as well as the recently emerging applications in physics.

Tags

vectors memory sequence batchsize
Einbetten
Wordpress FAU Plugin
iFrame
Teilen