42 - Deep Learning - Plain Version 2020 [ID:21176]

50 von 147 angezeigt

Welcome back to Deep Learning.

So today we want to discuss a little bit about the so-called Markov decision process, which

is the fundament of reinforcement learning.

I brought a couple of slides for you so that we can see what we are actually talking about,

and you will see that the topic is now reinforcement learning really.

We want to learn how to play games and how to play them really well, and the key element

will be the Markov decision process.

So we have to extend the multi-arm bandage problem that we talked about in the previous

video, and we have to introduce a state T of the world, so the world now has a state,

and now the rewards also depend on the action and the state of the world.

So depending on the state of the world, actions may produce a very different reward, and we

can encode this again in a probability density function, as you can see here on the slide.

What else?

Well, this setting now is known as the contextual bandit in the full reinforcement learning

problem. Also, the actions influence the state.

So we will see whatever action I take, it will also have an effect on the state, and

this may also be probabilistic, so we can describe it in another probability density

function.

This then leads us to the so-called Markov decision process, and Markov decision processes,

they take the following form. You have an agent, and the agent here on top is doing

actions, a T, and these actions have an influence on the environment.

And there is some environment, and the environment then generates rewards, as in the multi-arm

bandit problem, but it also changes the state.

So now our actions and the state are in relation with each other, and they are of course dependent

because we have a state transition probability density function that will cause the state

to be altered depending on the previous state and the action that was taken.

Now this transition also produces a reward, and this reward is now dependent on the state

and the action that was taken, otherwise it's very similar to what we've already seen in

the multi-arm bandit problem.

Of course we need policies, and the policies now also get the dependency on the state because

you want to look into the state in order to pick your action and not just rely on the

prior knowledge and choose the actions independent of the state, so all of them are expanded

accordingly.

Now if all of these sets are finite, this is a finite Markov decision process.

If you look at this figure, you can see this is a very abstract way of describing the entire

situation.

The agent is essentially the system that chooses the actions and designs the actions.

The environment is everything else.

So for example, if you were to control a robot, the robot itself would probably be part of

the environment and the location of the robot is also encoded in the state because everything

that can be done by the agent is merely designing actions or the memory and the knowledge about

the current situation is encoded in the state.

Now let's look at a more simple example than controlling robots right away.

We will look into a simple game and you see this is a grid game, a grid world, and we

have a couple of squares and our agent then can move across these squares and of course

the position of the agent is also part of the state.

So you can formulate this now that S is the field that we are currently on and now our

agent can move in all four directions so we can move up, down, left, right and any action

that would lead the grid has a probability that is equal to a direct delta function and

it will produce always the previous state.

Teil einer Videoserie :

Deep Learning - Plain Version

Presenters

Prof. Dr.-Ing. Andreas Maier

Zugänglich über

Offener Zugang

Dauer

00:14:14 Min

Aufnahmedatum

2020-10-12

Hochgeladen am

2020-10-12 19:56:27

Sprache

en-US

Deep Learning - Reinforcement Learning Part 2

This video explains the basics of reinforcement learning: The Markov Decision Process and how to compure the expected future return.

For reminders to watch the new video follow on Twitter or LinkedIn.

Further Reading:
A gentle Introduction to Deep Learning

Einbetten

Wordpress FAU Plugin

 https://www.fau.tv/clip/id/21176

iFrame

<iframe src="https://api.video.uni-erlangen.de/services/oembed/?url=https://www.fau.tv/clip/id/21176&format=iframe&maxwidth=1280&maxheight=720" width="1280" height="720"seamless allowfullscreen style="border: 0; padding: 0; margin: 0;overflow: hidden;"></iframe>

Herunterladen

Video

Per RSS abonnieren