11 - Machine Learning for Physicists [ID:8224]
50 von 1687 angezeigt

The following content has been provided by the University of Erlangen-Nürnberg.

OK, hello, good evening everyone. So today I wanted to finish what we said about reinforcement

learning. In particular, I want to remind you of Q-learning and then show you one example

where people have applied Q-learning. And then I want to switch to something different,

which is the connection between these neural networks and spin models and physics and how

that connection can be exploited in something that is called a restricted Boltzmann machine.

So there you will at least be reminded of your statistical physics lectures or maybe

you will for the first time learn about the Boltzmann weights.

Okay, but now first let's discuss Q learning. I mentioned that the idea is that at each

state in which our world can be in, in which our player can be in, there's a function Q

that tells us how good is the quality of different actions that we could take. And then of course

the optimal strategy, if you know the right Q, is just to take the best action, that is

the action with the highest Q. And the task of course that makes everything difficult

is how to learn this unknown function Q. Here's an example. If our player sits at this spot

and its goal is to collect as many of these green dots as possible, then probably the

Q function that tells us how good it is to move in the different directions would be

maximal for moving upward because that's easiest to collect another green dot. And then we

said how in principle you would define such a Q function. You would define it as the expected

future reward given the current state and given the action that you take. This future

reward is either just the sum of all rewards from now until the end of the game, or maybe

you introduce this discount where you say I care most about immediate rewards and not

so much about later rewards, which makes it a little bit easier to train. And then we

found that there is such a thing as a recursive equation that tells us how this Q function

should be defined in principle. So the Q function for this particular step is the reward that

we would get plus the discount factor times the Q function of the next optimal step that

we take one step afterwards. Now this is not yet enough because it's just a recursive definition

and we don't know the right hand side, so we cannot calculate the left hand side. But

we can set up an iterative scheme that should converge eventually to the correct Q function.

And this is the one that is shown here, that we take the old one and add to that a small

increment, where the increment is chosen such that it would be 0 if we are already at the

correct Q function, but if not, then it will take us towards the correct Q function. And

so this is an equation that you can work with. And in principle, if the state space and the

action space are sufficiently small, you could even explicitly have a table of all your Q

function values. For all possible S and all possible A, you just keep a long table in

the memory of your computer, and that's the Q function, and that you will always update

this table according to this equation. In reality, maybe the state space is exponentially

large because the state may be a whole picture that you see on a screen, and then each pixel

can have different values, so the number of possible pictures is exponentially large,

so there is no question to try to make the Q function in a big table because the table

would simply be too large. And then it's much better to represent the Q function by a neural

network. So the input of the neural network would be the current state, like a picture,

and the output would be the Q values for the different actions that you can take, assuming

that there are only very few actions, only going up south or west or east, for example.

Okay. So here, just to visualize what this means, imagine I'm walking around on a grid.

The red dot is where I want to go, then I will get a reward, and my task is to go to

this red dot in the minimum number of time steps, for example, and then I will get a

high reward. And I'm trying to plot the Q function for the action to go up as a function

of the state S, where the state S is just the lattice site on which I am currently.

And initially, after one update, I would have something like the Q function on this state,

which is immediately below the target state. There, the Q function for going up is very

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:30:44 Min

Aufnahmedatum

2017-07-17

Hochgeladen am

2017-07-18 11:48:52

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen