12 - Artificial Intelligence II [ID:47303]
50 von 852 angezeigt

Okay, so let's restart.

We're basically at the last kind of breaths of the reasoning with uncertainty in agents.

We have progressed to partially observable Markov decision processes.

Markov decision processes are sequential decision problems where we kind of chickened out

of the partial observability part.

And we've looked at, we've basically looked at the basic algorithms.

Remember, we had filtering, smoothing, and all of those things for dynamic invasion networks,

which you think about it is if you think about the utility-based agents, which basically

has this world model component up there, which we made dynamic invasion networks for.

And there we do things like filtering and smoothing and those kind of things.

So down here, we always have at least in utility-based agents this kind of decision procedure

of maximizing the expected utility for the actions.

And the problem when we put time into this is exactly having utilities.

So what we've been doing technically in the last days was just basically trying to get

ourselves utilities based on states.

Because if we have that, we can just drop it in here and everything is fine.

That led to yesterday and last week to the development of these value and policy iteration

algorithms.

Right?

That gives us the utility and then we're fine.

And eventually we've basically, yesterday, lifted that to POMDPs and the price to be paid

there was that we had to do it at the belief level.

Remember?

The belief state of an agent, which really talks about the only partially known physical

state, because it's a probability distribution over physical states, which may or may not

be observable or partially observable or whatever.

The belief state is always observable.

There is a variable somewhere up there, right?

Which says state number one, 10%, state number three, 25%, and so on.

So the belief state itself is observable to the agent.

That's kind of the key insight that we're looking at here.

And we looked at a way of doing value iteration as an algorithm at the belief state.

And the outcome was extremely disappointing because it's not only exponential, which we've

dealt with in the past.

That is also doubly exponential, meaning the exponent of the exponential is actually an

exponential itself.

So it's huge, meaning it doesn't really work in practice.

Now, that's the worst case.

Right?

It's really designed to work with very, very unstructured environments.

But we spent quite a lot of work, remember Bayesian networks, dynamic Bayesian networks,

where we were representing probabilistic reasoning in a way that we could make use of the structure

of the world, of the independencies and conditional independencies, that kind of govern the

environment.

Okay?

That's what the dynamic Bayesian networks, namely Bayesian networks with time, were actually

designed to be.

So there's a hope that for highly structured environments like our own, we might even

be able to do better.

And that's what I want you to give you a glimpse of.

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:31:01 Min

Aufnahmedatum

2023-05-24

Hochgeladen am

2023-05-30 15:19:07

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen