7 - Monte-Carlo Tree Search (Part 2) [ID:22262]
50 von 76 angezeigt

Okay, so the main thing here is how do we sample? So what you really want is you

want to balance between exploration of the search space, which is essentially

sampling, and exploitation of that information. So we go back here, we've

explored a little bit of the search space and at some point we have said

this is enough information, now we're going to act. When you do that is a

choice that matters and we have no good idea of when that actually is. That is

something that needs to be again explored for a particular game because

there's no general rule that says oh exploitation after three levels that's

good and we don't know that. That's a parameter. Parameters to your search

algorithm which you might want to test. Okay but on a very abstract level you

can say you have to strike a good balance and you have to optimize for that.

How that works? Up to you. And of course you can now go and use advanced math or

statistics or something like that to optimize here and there are techniques

of this. One is this upper confidence bounds applied to trees. That's the name

that comes out. And really what you do there is you kind of, there's a

good, as you can imagine, there's a good well-understood area of

mathematics of playing one-armed bandits in casinos. It's relatively easy to

understand and if you understand it well then you might actually do better than

not. And for some games you can actually do card counting and all those kind of

things and you can actually win. So casino situations are things where we

understand the maths extremely well and so if you look at kind of this situation

or kind of this situation even better then you can think of them as random

processes which you can kind of make predictions about and this here you can

these kind of choices you can think of as a one-armed bandit.

You know these machines where you kind of and then three discs spin

and so on and it's roughly these are the three discs which come back with a value

and then you have to choose which one to select or something like this. And so you

can you can apply multi-armed bandit theory to these kind of things and that

actually gives you something that gives you a good strategy which happens to

work relatively relatively well. But you can also do something else and that's

very attractive if you have huge amounts of computing power and not that much

brain power or math power. You can basically you can basically take neural

networks and let them learn by playing against each other. It's kind of a little

bit like the genetic algorithms we were we were looking at. And so what the AlphaGo

people did they went this route they have Google backs them so they have all

the computing power you would ever want and some of them are called policy

networks and some of them are called value networks and what they do is you

learn some policies from analyzing human games and human rule books and all of

those kind of things where you just say if you are in this kind of situation

then you have to put your piece there unless you're white then it goes there

something like this right so out of those if you can replay them you can

actually get some information. But you can also look for things which are kind

of things you learn from self play because you play against yourselves and

sometimes you win against yourself or sometimes you lose against yourself

right and that gives you evaluation and those you can as we say back propagate

through the neural networks which is the trigger of the learning. Now the neural

networks the standard stuff there's very little choice that you can actually do

you can actually well what you what you can do is you can kind of look at the

pattern of the networks but usually you kind of you kind of have almost complete

networks with kind of a many to one thing at the end and up until recently

Teil eines Kapitels:
Adversarial Search for Game Playing

Zugänglich über

Offener Zugang

Dauer

00:09:48 Min

Aufnahmedatum

2020-10-30

Hochgeladen am

2020-10-30 10:26:44

Sprache

en-US

It is explained,how a search should be guided and what is done in AlphaGo.

Einbetten
Wordpress FAU Plugin
iFrame
Teilen