64 - Recap Clip 11.1: Reinforcement Learning: Introduction & Motivation [ID:30467]
45 von 45 angezeigt

Reinforcement learning is a form of unsupervised learning,

and unsupervised learning is learning without labeled examples.

It's a slightly more tedious way of learning,

because you don't have examples,

you can just optimize for.

But it's in a way more realistic.

So we're learning from rewards,

which in this case we also call reinforcements.

These goals can come at the end,

or can be hints of the environment to the agent in between.

So the topic of having rewards is actually something that points us

into the direction in which a solution here to reinforcement learning could lie.

We've introduced rewards as part of Markov decision procedures.

The idea here that is the main idea in reinforcement learning is that you want to

look at reinforcement learning in a way as an MDP,

with the only difference that in Markov decision procedures,

the reward function was totally observable,

whereas it's only partially observable in unsupervised learning.

You should think of these delayed rewards,

the reinforcements don't come after every action.

In MDPs, we had a reward in every action.

So the reinforcements really come at intervals or at the end.

So we interpret that as a reward function,

which is only partially observable.

You're in theory, in fiction,

you're getting a reward after every action,

except nobody tells you what it is, which is realistic.

You come to AI lectures every Wednesday and Thursday,

and you get a reward for that even if you don't know that because by learning.

It's not directly observable,

but you're getting something out of it,

apparently, or you're expecting to get something out of that.

Then of course, the day of reckoning is Tuesday,

you get your reinforcement ultimately in the exam.

Of course, there's intermediate rewards

in getting points for their homeworks as well.

So it's partially your actions get partially observable rewards.

That's the difference and that's something we have to account for.

So you really have something like MDPs or possibly even

POMDP situations where you have partially observable reward functions.

That's most of what you should remember about this.

There are lots of technical things,

all of which I'm not going to tell you.

So there's lots of parameters you can learn and that makes

the problems difficult or simple.

Teil eines Kapitels:
Recaps

Zugänglich über

Offener Zugang

Dauer

00:04:29 Min

Aufnahmedatum

2021-03-30

Hochgeladen am

2021-03-31 11:26:50

Sprache

en-US

Recap: Reinforcement Learning: Introduction & Motivation

Main video on the topic in chapter 11 clip 1.

Einbetten
Wordpress FAU Plugin
iFrame
Teilen