Okay, so let's restart.
We're basically at the last kind of breaths of the reasoning with uncertainty in agents.
We have progressed to partially observable Markov decision processes.
Markov decision processes are sequential decision problems where we kind of chickened out
of the partial observability part.
And we've looked at, we've basically looked at the basic algorithms.
Remember, we had filtering, smoothing, and all of those things for dynamic invasion networks,
which you think about it is if you think about the utility-based agents, which basically
has this world model component up there, which we made dynamic invasion networks for.
And there we do things like filtering and smoothing and those kind of things.
So down here, we always have at least in utility-based agents this kind of decision procedure
of maximizing the expected utility for the actions.
And the problem when we put time into this is exactly having utilities.
So what we've been doing technically in the last days was just basically trying to get
ourselves utilities based on states.
Because if we have that, we can just drop it in here and everything is fine.
That led to yesterday and last week to the development of these value and policy iteration
algorithms.
Right?
That gives us the utility and then we're fine.
And eventually we've basically, yesterday, lifted that to POMDPs and the price to be paid
there was that we had to do it at the belief level.
Remember?
The belief state of an agent, which really talks about the only partially known physical
state, because it's a probability distribution over physical states, which may or may not
be observable or partially observable or whatever.
The belief state is always observable.
There is a variable somewhere up there, right?
Which says state number one, 10%, state number three, 25%, and so on.
So the belief state itself is observable to the agent.
That's kind of the key insight that we're looking at here.
And we looked at a way of doing value iteration as an algorithm at the belief state.
And the outcome was extremely disappointing because it's not only exponential, which we've
dealt with in the past.
That is also doubly exponential, meaning the exponent of the exponential is actually an
exponential itself.
So it's huge, meaning it doesn't really work in practice.
Now, that's the worst case.
Right?
It's really designed to work with very, very unstructured environments.
But we spent quite a lot of work, remember Bayesian networks, dynamic Bayesian networks,
where we were representing probabilistic reasoning in a way that we could make use of the structure
of the world, of the independencies and conditional independencies, that kind of govern the
environment.
Okay?
That's what the dynamic Bayesian networks, namely Bayesian networks with time, were actually
designed to be.
So there's a hope that for highly structured environments like our own, we might even
be able to do better.
And that's what I want you to give you a glimpse of.
Presenters
Zugänglich über
Offener Zugang
Dauer
01:31:01 Min
Aufnahmedatum
2023-05-24
Hochgeladen am
2023-05-30 15:19:07
Sprache
en-US