Okay, then I would like to make a very, very brief foray into reinforcement learning.
Remember what we've done so far is learning from examples.
Which say you are learning how to play chess, actually means that you have a trainer.
You play against somebody and somebody looks over your shoulder and says, do this now.
The best thing to do is put your king on c2.
Or when you are about to move your king on to c2, I wouldn't do that, because you're
in check then.
Those kind of things.
Most people don't.
When you learn how to play chess, you're usually given the rules and then you play against
other people.
In learning AI, we give you homework problems.
And while you're solving that, we're not telling you, oh, do this and do not do that and so
on.
We don't look over your shoulder.
What we do instead is we accept your solution and then we say, oh, six and a half out of
ten points.
And then what?
If you think about it, we're not giving you labeled examples.
Chess doesn't give you labeled examples.
At least not at the move level.
Well they sometimes do, because in your newspaper they'll have the chess section and say, oh,
checkmate in three or something like this.
That's kind of like worked examples.
They're very useful.
But if you think about it, there's an extreme.
Worked examples, labeled examples, give you kind of the best of all worlds.
And chess gives you, well, it gives you some feedback.
Because if you play chess, in the end you lose or you win or you draw.
So it gives you some feedback, but in a relatively delayed way.
And unsupervised learning just basically means you're going to get feedback, but it might
take a while.
So learning without feedback is impossible.
Because you don't have an external standard in which to compare yourself against.
Remember we have this learning agent, little box, had this funny little side entry where
it receives feedback.
So unsupervised learning is essentially learning in situations where you're not actually having
examples, but you're having a reward or, as we say, a reinforcement after a while.
So we need some kind of a feedback.
And there are many situations in which we get widely delayed reinforcements.
Chess is one of these examples.
You actually win or lose or draw after an hour.
In soccer, just to use a recent example, things are different.
You do win, lose or draw, but you score goals in between.
And goals are considered a good thing.
And you get penalties in between.
And penalties are considered a bad thing.
And those kind of things.
So you can either have ultimate feedback or intermediate feedback.
But you almost never, if you don't have labeled examples, we'll call it unsupervised learning.
Presenters
Zugänglich über
Offener Zugang
Dauer
00:11:14 Min
Aufnahmedatum
2021-03-30
Hochgeladen am
2021-03-31 08:16:46
Sprache
en-US
Motivation for Reinforcement Learning as a type of unsupervised learning. Overview of the different forms of Reinforcement Learning.