The next algorithm we looked at yesterday is fundamentally different.
We're throwing systematicity out of the window and instead of having evaluation functions,
we actually sample.
The idea is that instead of having evaluations, you kind of sample all the way down and look
at what these numbers tell you and hope they're representative.
We're kind of making unsystematic assessment of the game here.
We drill all the way down by making random choices.
We find a game where we get a reward or a utility of 10.
We throw everything away.
Do it again and again and again and again.
In this case, we've sampled every one of these twice.
We looked at the average outcomes and that lets me greedily, in a local search-like thing,
decide that the left one looks best.
What we're doing here with the sampling is we're really just using sampling instead of
some kind of an evaluation function.
If they forgot to give us an evaluation function and we can't dream up a good one, this is
something we can do.
This is a very widely applicable algorithm.
Why don't we get the maximum value?
Why do we get the average?
For example, in one branch, there's 90 and 10, and the other branch, 60 and 60.
The average of the 60 and the other one is 50, but the maximum value is on the left side.
Everybody depends on how optimistic you are.
I would think that an average gives you some kind of a sense of what all the outcomes are.
If you're extremely optimistic, say, if there is a way I can win, I will find it, then you
go for the maximum.
It's a different evaluation.
It's even riskier than average.
Where you kind of have... If you have a maximum, the maximum might be on a path where your
opponent is extremely dumb.
If you don't want to rely on your opponent being extremely dumb, you have to do something.
Since you don't know how your opponent chooses, then you at least assume that your opponent
can be approximated by random choice, which is better than relying on a dumb opponent.
That's kind of why I would always pick average.
If you're very optimistic, play the lotto, say, oh, there is a million in there, so I'm
going to play and I'm going to win, then that's the maximum.
The average you get for playing lotto or so is if you put in five euros, on an average
you get, I think, five cents out.
This probably means that if you play often, you lose big time.
You might win if you're an optimist.
Even though I'm an optimist, I'm not that much of an optimist to play lotto.
That's one algorithm.
That's an algorithm that really is optimized, if you will, towards running on very small
computers.
It needs six memory cells, essentially, plus a constant number for storing the program
and the tree and so on.
Another thing we looked at what you can do is you can remember more nodes because you
can get values for every node on the path you take down.
Typically that's quite long.
For every node on the path you sampled, you get these kind of values, which means that
Presenters
Zugänglich über
Offener Zugang
Dauer
00:11:26 Min
Aufnahmedatum
2020-10-30
Hochgeladen am
2020-10-30 10:37:38
Sprache
en-US
Recap: Monte-Carlo Tree Search (Part 1)
Main video on the topic in chapter 8 clip 6.