11 - Artificial Intelligence II [ID:57511]
50 von 735 angezeigt

Okay, so let's start.

This sounds loud.

Well, okay.

We are in the last reasoning under uncertainty chapter.

Remember, before yesterday we looked at things like Markov chains, dynamic Bayesian networks,

hidden Markov models and so on.

And if you think about what we've been doing, they're just the modeling component,

not the decision component.

Always if we want to do decisions, we want to do this maximizing expected utility stuff.

All of that happens here.

That's what we're about now.

Some kind of a Markov chain or HMM or whatever happens here and kind of deals with the percepts

and how the world changes over time and so on.

And I wish I had colored chalk, but I don't.

So I'm going to kind of make a circle.

We still have the Markov something or the other.

But that's the important thing now, making decisions.

That's something all of these areas actually share.

Search is all about making decisions.

What's in this case, typically heuristically, the next move or the plan that we're going

to pursue.

And we can do it kind of the easy way last semester.

We can add in uncertainty and utility, which gives us something we're not covering in this

course, but which is something you're very well equipped to understand by now.

And then we have what we will kind of do next.

Markov decision problems, MDPs, partially observable MDPs, POMDPs.

So that's kind of the new thing here.

And the example was this little four times three world, which kind of lives from the

fact that instead of directly having utility, we kind of sum up rewards.

Under actions that are non-deterministic.

In this case, we still have full observability, which means we're still up here.

OK?

POMDPs come next.

So we're doing the slightly easier thing first as a warm-up exercise.

Here's the definition.

We have states.

We have transition models.

We have actions, just as before.

But we have a reward function, that per action gives you a reward or a punishment.

And given that, and given that the utility is not something that is given for a state,

we can do that maybe via Ramsey's theorem, where we look at state preferences.

Here the utility actually comes from the accumulated rewards, which changes the whole thing.

That's the generalization we will look at now.

And we've seen that this is an interesting area, because it kind of leads to interesting

agent behaviors, without having to change the overall mechanisms, we hope.

OK?

So utilities over time.

The idea really is that instead of having utilities of states, we have to consider utilities

over state sequences, which measures very well with the notion of a sequential environment,

which basically has the most general notion of time.

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:25:05 Min

Aufnahmedatum

2025-05-28

Hochgeladen am

2025-05-29 18:39:05

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen