OK.
So the thing I've left out is how to find optimal policies,
optimal policies at the belief state level.
And we've briefly looked into how
given an optimal policy at belief state level,
say that, is given to us from heaven,
we would actually build an agent that kind of operationalizes
this policy.
And the upshot of that was we're going
to take dynamic Bayesian networks.
Dynamic Bayesian networks, remembering
are these time-sliced Markov-like Bayesian networks.
And we add decision nodes and utility nodes,
just like we did when time didn't play a role,
only that now we also have reward nodes.
Remember, we have in our models, we
have these rewards or costs or whatever, which
we know about for the past.
And that's kind of the situation here.
We have rewards we know about, we have actions we know about,
and we have evidence we know about.
And the model kind of can be unrolled into the future.
We've unrolled it three times into the future.
And we have the utility node, which
kind of sums up over the expected rewards in a lump
utility value for the future.
It's kind of the obvious generalization
you get from those things we've done before.
And now we can compute.
And it turns out that we actually,
from the perspective of the agent,
we get an and or tree, essentially.
At any time slice, you have to take a decision on an action.
Those are the triangle nodes.
And they are really fully determined,
because we have a policy at the belief state.
They're fully determined by the belief state.
And since the belief state is fully observable,
there is nothing that can go wrong.
It's just the policy, belief state policy,
is just a function we apply.
So then we have the upcoming observations, the evidence.
It's something we don't know about.
So we model that as a set of chance nodes.
That gives us something to systematically search over.
We take a decision, and then the environment,
the next percept comes on.
And we have to plan on for what that might be.
And we discussed about the similarity to minimax here,
except that we're, in this case, not
Presenters
Zugänglich über
Offener Zugang
Dauer
00:08:15 Min
Aufnahmedatum
2021-03-30
Hochgeladen am
2021-03-31 11:16:32
Sprache
en-US
Recap: Online Agents with POMDPs
Main video on the topic in chapter 7 clip 6.