45 - Deep Learning - Plain Version 2020 [ID:21179]
50 von 222 angezeigt

Welcome back to deep learning and today we want to talk about deep reinforcement learning.

So I have a couple of slides for you and of course we want to build on the concepts that

we've seen in reinforcement learning but we talk about deep Q learning today.

And one of the very well known examples is the human control through deep reinforcement

learning here in reference four.

This was done by Google DeepMind and they showed a neural network is able to play Atari

games.

So the idea here is to directly learn the action value function using a deep network

and the inputs are essentially the three subsequent video frames from the game and this is processed

by a deep network and it produces the best next action.

So the idea is now to use this deep reinforcement framework to learn the best next controller

movements and they do convolutional layers for the frame processing and then fully connected

layers for the final decision making and here you see the main idea of the architecture.

So there's these convolutional layers, relu's, you have the input frames that are processed

by these then you go into fully connected layers and again fully connected layers and

then you produce directly the output and you can see that in Atari games this is a very

limited set so you can either do no action then there is essentially eight directions,

there's a fire button and there's the eight directions plus the fire button.

So that's all of the different things that you can do so it's a limited domain and you

can then train your system with that.

Well it's a deep queue network that directly applies queue learning.

The state of the game is essentially the current plus three previous frames as an image stack

so you have a rather fuzzy way of incorporating memory and state.

Then you have 18 outputs that are associated with the different actions and each output

estimates the action for the given input.

You don't have a label and a cost function but you update with respect to maximize the

future reward.

There's a reward of plus one when the game score is increased and a reward of minus one

when the game score is decreased otherwise it's zero.

They use an epsilon greedy policy with epsilon decreasing to a low value during the training

and they use a semi-gradient form of the queue learning to update the network weights w and

again they use mini-batches to accumulate the weight updates.

So they have this target network and it's updated using the following rule so you can

see that this is very close to what we have seen in the previous video so again you have

the weights and you update them with respect to the rewards.

Now the problem is of course that this gamma and selection of the maximum queue function

is a function of the weights again.

So you somehow have a dependency of the maximization in there and the weights you're trying to

update.

So your target changes simultaneously with the weights that we want to learn and this

can actually lead to oscillations or divergence of your weights.

So this is not very good.

So they introduce a second target network and this after C steps they generate this

by copying the weights of the action value network to a duplicate network and keep them

fixed.

So you use the output Q bar of the target network as a target to stabilize the previous

maximization.

So you don't use Q hat the function that you're trying to learn but you use the Q bar which

is the kind of fixed version that you use for a couple of iterations.

Another trick they have been using is experience replay and here the idea is to reduce the

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:18:48 Min

Aufnahmedatum

2020-10-12

Hochgeladen am

2020-10-12 21:56:19

Sprache

en-US

Deep Learning - Reinforcement Learning Part 5

In the last video on reinforcement learning, we look into the deep reinforcement learning techniques. We start looking into how Deep Mind beat Atari Games and in particular breakout. Furthermore, we look into the technology behind AlphaGo and AlphaGoZero to play Go, Chess, and Shogi on world-class level.

For reminders to watch the new video follow on Twitter or LinkedIn.

Links
Link to Sutton's Reinforcement Learning in its 2018 draft, including Deep Q learning and Alpha Go details

Further Reading:
A gentle Introduction to Deep Learning

References
[1] David Silver, Aja Huang, Chris J Maddison, et al. “Mastering the game of Go with deep neural networks and tree search”. In: Nature 529.7587 (2016), pp. 484–489.
[2] David Silver, Julian Schrittwieser, Karen Simonyan, et al. “Mastering the game of go without human knowledge”. In: Nature 550.7676 (2017), p. 354.
[3] David Silver, Thomas Hubert, Julian Schrittwieser, et al. “Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm”. In: arXiv preprint arXiv:1712.01815 (2017).
[4] Volodymyr Mnih, Koray Kavukcuoglu, David Silver, et al. “Human-level control through deep reinforcement learning”. In: Nature 518.7540 (2015), pp. 529–533.
[5] Martin Müller. “Computer Go”. In: Artificial Intelligence 134.1 (2002), pp. 145–179.
[6] Richard S. Sutton and Andrew G. Barto. Introduction to Reinforcement Learning. 1st. Cambridge, MA, USA: MIT Press, 1998.

Einbetten
Wordpress FAU Plugin
iFrame
Teilen