11 - Artificial Intelligence II [ID:47302]
50 von 862 angezeigt

We're still looking at sequential decision procedures, meaning if we look back at our agents,

we're still looking at our agents.

All right, well, we have this upper world modeling part and we're down here in the decision

part.

Now, essentially, we've done that before, but now we're doing it sequentially, meaning

in a sequential environment, meaning where the utility of an action depends on a sequence

of decisions.

We have to take time into account.

We were looking before at episodic environments where it essentially, where time passes,

but while we're deliberating, while the agent deliberates about what to do, it doesn't

really play a role.

Time doesn't play a role.

Here time plays a role.

We're trying to deal with this.

We're looking at Markov decision problems that's something we actually did last week.

Today we're going to graduate to Pupindy P's, namely partially observable Markov decision

procedures, MVP's Markov decision procedures is essentially the simplified case where we

have a totally observable environment.

We want to eventually make decisions in a partially observable world where we have uncertainty

and utility to deal with.

But if in the technical work we're doing, you feel a little bit unsatisfied because we're

always talking about utilities, never about actions and so on, that's actually intended.

We already know what we have to do given a utility.

We do just maximization of expected utility, which is basically the weighted sum over

utility of a result action.

And since we're uncertain times the probability that this action gets us into the state we

have the utility for.

Easy peasy.

The only difficult thing with time here is to actually find out what utilities are and

in particular to deal with a utilities of time sequences.

That's the first step we have to do.

And the second step equally important is to somehow get from time sequences to utility

of individual states.

Because maximizing utility is not something that has utilities of time sequences somewhere

baked into its genetics.

Now it wants utilities of single states.

So the program in a way for MDPs is to say, well, can't we just make utilities of single

states?

That's what we want.

And last week we've managed to define that.

I'm going to try and convince you that even if we've managed to define that, we don't

know how to compute that yet.

And that's what we're learning today.

Okay.

We're doing this bit here, planning we've already seen.

And we will see surprisingly that in POMDPs, plans and so on will make an appearance.

So everything is connected to everything.

Okay.

We looked at this four times three example where we had reliable sensing, but unreliable

actions and we had this reward function that basically says every time tick you get a

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:26:28 Min

Aufnahmedatum

2023-05-23

Hochgeladen am

2023-05-24 19:39:06

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen