31 - Recap Clip 8.6: Monte-Carlo Tree Search (Part 1) [ID:22266]

50 von 100 angezeigt

The next algorithm we looked at yesterday is fundamentally different.

We're throwing systematicity out of the window and instead of having evaluation functions,

we actually sample.

The idea is that instead of having evaluations, you kind of sample all the way down and look

at what these numbers tell you and hope they're representative.

We're kind of making unsystematic assessment of the game here.

We drill all the way down by making random choices.

We find a game where we get a reward or a utility of 10.

We throw everything away.

Do it again and again and again and again.

In this case, we've sampled every one of these twice.

We looked at the average outcomes and that lets me greedily, in a local search-like thing,

decide that the left one looks best.

What we're doing here with the sampling is we're really just using sampling instead of

some kind of an evaluation function.

If they forgot to give us an evaluation function and we can't dream up a good one, this is

something we can do.

This is a very widely applicable algorithm.

Why don't we get the maximum value?

Why do we get the average?

For example, in one branch, there's 90 and 10, and the other branch, 60 and 60.

The average of the 60 and the other one is 50, but the maximum value is on the left side.

Everybody depends on how optimistic you are.

I would think that an average gives you some kind of a sense of what all the outcomes are.

If you're extremely optimistic, say, if there is a way I can win, I will find it, then you

go for the maximum.

It's a different evaluation.

It's even riskier than average.

Where you kind of have... If you have a maximum, the maximum might be on a path where your

opponent is extremely dumb.

If you don't want to rely on your opponent being extremely dumb, you have to do something.

Since you don't know how your opponent chooses, then you at least assume that your opponent

can be approximated by random choice, which is better than relying on a dumb opponent.

That's kind of why I would always pick average.

If you're very optimistic, play the lotto, say, oh, there is a million in there, so I'm

going to play and I'm going to win, then that's the maximum.

The average you get for playing lotto or so is if you put in five euros, on an average

you get, I think, five cents out.

This probably means that if you play often, you lose big time.

You might win if you're an optimist.

Even though I'm an optimist, I'm not that much of an optimist to play lotto.

That's one algorithm.

That's an algorithm that really is optimized, if you will, towards running on very small

computers.

It needs six memory cells, essentially, plus a constant number for storing the program

and the tree and so on.

Another thing we looked at what you can do is you can remember more nodes because you

can get values for every node on the path you take down.

Typically that's quite long.

For every node on the path you sampled, you get these kind of values, which means that

Teil einer Videoserie :

Artificial Intelligence (AI-1) WS 2020

Teil eines Kapitels:

Recaps

Presenters

Prof. Dr. Michael Kohlhase

Zugänglich über

Offener Zugang

Dauer

00:11:26 Min

Aufnahmedatum

2020-10-30

Hochgeladen am

2020-10-30 10:37:38

Sprache

en-US

Recap: Monte-Carlo Tree Search (Part 1)

Main video on the topic in chapter 8 clip 6.

Einbetten

Wordpress FAU Plugin

 https://www.fau.tv/clip/id/22266

iFrame

<iframe src="https://api.video.uni-erlangen.de/services/oembed/?url=https://www.fau.tv/clip/id/22266&format=iframe&maxwidth=1280&maxheight=720" width="1280" height="720"seamless allowfullscreen style="border: 0; padding: 0; margin: 0;overflow: hidden;"></iframe>

Herunterladen

Video

Per RSS abonnieren