Welcome back to deep learning and today we want to talk about deep reinforcement learning.
So I have a couple of slides for you and of course we want to build on the concepts that
we've seen in reinforcement learning but we talk about deep Q learning today.
And one of the very well known examples is the human control through deep reinforcement
learning here in reference four.
This was done by Google DeepMind and they showed a neural network is able to play Atari
games.
So the idea here is to directly learn the action value function using a deep network
and the inputs are essentially the three subsequent video frames from the game and this is processed
by a deep network and it produces the best next action.
So the idea is now to use this deep reinforcement framework to learn the best next controller
movements and they do convolutional layers for the frame processing and then fully connected
layers for the final decision making and here you see the main idea of the architecture.
So there's these convolutional layers, relu's, you have the input frames that are processed
by these then you go into fully connected layers and again fully connected layers and
then you produce directly the output and you can see that in Atari games this is a very
limited set so you can either do no action then there is essentially eight directions,
there's a fire button and there's the eight directions plus the fire button.
So that's all of the different things that you can do so it's a limited domain and you
can then train your system with that.
Well it's a deep queue network that directly applies queue learning.
The state of the game is essentially the current plus three previous frames as an image stack
so you have a rather fuzzy way of incorporating memory and state.
Then you have 18 outputs that are associated with the different actions and each output
estimates the action for the given input.
You don't have a label and a cost function but you update with respect to maximize the
future reward.
There's a reward of plus one when the game score is increased and a reward of minus one
when the game score is decreased otherwise it's zero.
They use an epsilon greedy policy with epsilon decreasing to a low value during the training
and they use a semi-gradient form of the queue learning to update the network weights w and
again they use mini-batches to accumulate the weight updates.
So they have this target network and it's updated using the following rule so you can
see that this is very close to what we have seen in the previous video so again you have
the weights and you update them with respect to the rewards.
Now the problem is of course that this gamma and selection of the maximum queue function
is a function of the weights again.
So you somehow have a dependency of the maximization in there and the weights you're trying to
update.
So your target changes simultaneously with the weights that we want to learn and this
can actually lead to oscillations or divergence of your weights.
So this is not very good.
So they introduce a second target network and this after C steps they generate this
by copying the weights of the action value network to a duplicate network and keep them
fixed.
So you use the output Q bar of the target network as a target to stabilize the previous
maximization.
So you don't use Q hat the function that you're trying to learn but you use the Q bar which
is the kind of fixed version that you use for a couple of iterations.
Another trick they have been using is experience replay and here the idea is to reduce the
Presenters
Zugänglich über
Offener Zugang
Dauer
00:18:48 Min
Aufnahmedatum
2020-10-12
Hochgeladen am
2020-10-12 21:56:19
Sprache
en-US
Deep Learning - Reinforcement Learning Part 5
In the last video on reinforcement learning, we look into the deep reinforcement learning techniques. We start looking into how Deep Mind beat Atari Games and in particular breakout. Furthermore, we look into the technology behind AlphaGo and AlphaGoZero to play Go, Chess, and Shogi on world-class level.
For reminders to watch the new video follow on Twitter or LinkedIn.
Further Reading:
A gentle Introduction to Deep Learning
References
[1] David Silver, Aja Huang, Chris J Maddison, et al. “Mastering the game of Go with deep neural networks and tree search”. In: Nature 529.7587 (2016), pp. 484–489.
[2] David Silver, Julian Schrittwieser, Karen Simonyan, et al. “Mastering the game of go without human knowledge”. In: Nature 550.7676 (2017), p. 354.
[3] David Silver, Thomas Hubert, Julian Schrittwieser, et al. “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm”. In: arXiv preprint arXiv:1712.01815 (2017).
[4] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, et al. “Human-level control through deep reinforcement learning”. In: Nature 518.7540 (2015), pp. 529–533.
[5] Martin Müller. “Computer Go”. In: Artificial Intelligence 134.1 (2002), pp. 145–179.
[6] Richard S. Sutton and Andrew G. Barto. Introduction to Reinforcement Learning. 1st. Cambridge, MA, USA: MIT Press, 1998.