Okay, hello, good evening. Let me first recall what we did last time. We started discussing
recurrent neural networks. That means networks that have memory. So the situation is depicted
here, you would have as input to your network a time sequence. And so maybe at some point
in the time sequence there's a kind of signal, then for a long time there are no important
signals and suddenly there's another signal that now references back whatever the network
received in earlier time slots. So the network really needs a memory to reply to this challenge.
And a typical example would be if you tell the network some story which is a sequence
of words and then at some point you ask a question about the story and it has to reply
and then obviously it needs a memory. And so the solution are these recurrent neural
networks. You can imagine that here you have the input so each of these blue circles could
in the simplest case be just one value, one input neuron and then the blue circles labeled
output would be also just one neuron, one value. But in a more complicated setting each
of these blue circles could be a whole layer of neurons. And then what this depicts is
actually the progression in time and so the network really doesn't change. The network
would just be this combination in this simple case of input and output neuron or input and
output layer. But I draw it several times so at each new time step I again draw the
same network. And the reason I do this is because I want to be able to depict how the
network keeps memory. So how it keeps some of the values of an earlier time step influencing
the calculation of the values for the next time step. So here the slide just depicts
that each of these blue circles in reality can be a whole layer of neurons and of course
we can also play the same trick as before we can have hidden layers. So now there would
be the network would consist of an input layer, a hidden layer and an output layer and I draw
many many copies of the network corresponding to the different time steps and of course
for each different time step the input is different so that means all the values in
the whole network will be different even just for this reason alone. Plus also the memory
it receives from earlier times will of course be different and that also can make the output
of the network different. So the question is how learning proceeds and again and the
principle is the same as usual you will have the correct answer that you supply to the
network together with the input and then you want to make sure that it comes closer to
this correct answer. So here I depicted a situation where inputs are given during all
the time steps but the output is only read out at the very final time and then the correct
answer is known, the correct output is known maybe your true output of the neural network
deviates from this correct output so you calculate the deviation and you try to minimize this
deviation by taking the gradient and that means really what you want to do is you want
to adapt all the weights in the whole network in order to get the right output. And so when
you now do back propagation not only do you have to take gradients down towards the lower
layers in the network that was the case all the time already but also when you calculate
these gradients you have to go back in time because the output of earlier times influences
the final result. And so you have to try to adapt all these weights. Now there's a funny
thing here remember this is not I don't know 10 different networks somehow connected together
it's only a single network which I draw repeatedly for the different time steps so that means
the weights are the same for the time steps the weights do not change in time and that
means if you find out during your back propagation that you should change this weight a little
bit in some particular direction but here you should change the weight in some particular
different direction what you will end up doing is actually changing the weights that belong
to this particular arrow that connects the hidden layer and the output layer you will
change them according to the say average or the sum of all these proposed changes. Okay
so now there was a fundamental challenge here and that was the exploding gradients or vanishing
gradients problem. So that occurs not only for back propagation through time but in principle
also if you have a network with many many layers and then you want to back propagate
Presenters
Zugänglich über
Offener Zugang
Dauer
01:21:48 Min
Aufnahmedatum
2019-06-03
Hochgeladen am
2019-06-04 04:39:03
Sprache
en-US
This is a course introducing modern techniques of machine learning, especially deep neural networks, to an audience of physicists. Neural networks can be trained to perform diverse challenging tasks, including image recognition and natural language processing, just by training them on many examples. Neural networks have recently achieved spectacular successes, with their performance often surpassing humans. They are now also being considered more and more for applications in physics, ranging from predictions of material properties to analyzing phase transitions. We will cover the basics of neural networks, convolutional networks, autoencoders, restricted Boltzmann machines, and recurrent neural networks, as well as the recently emerging applications in physics.