39 - Recap Clip 7.6: Online Agents with POMDPs [ID:30442]

50 von 109 angezeigt

OK.

So the thing I've left out is how to find optimal policies,

optimal policies at the belief state level.

And we've briefly looked into how

given an optimal policy at belief state level,

say that, is given to us from heaven,

we would actually build an agent that kind of operationalizes

this policy.

And the upshot of that was we're going

to take dynamic Bayesian networks.

Dynamic Bayesian networks, remembering

are these time-sliced Markov-like Bayesian networks.

And we add decision nodes and utility nodes,

just like we did when time didn't play a role,

only that now we also have reward nodes.

Remember, we have in our models, we

have these rewards or costs or whatever, which

we know about for the past.

And that's kind of the situation here.

We have rewards we know about, we have actions we know about,

and we have evidence we know about.

And the model kind of can be unrolled into the future.

We've unrolled it three times into the future.

And we have the utility node, which

kind of sums up over the expected rewards in a lump

utility value for the future.

It's kind of the obvious generalization

you get from those things we've done before.

And now we can compute.

And it turns out that we actually,

from the perspective of the agent,

we get an and or tree, essentially.

At any time slice, you have to take a decision on an action.

Those are the triangle nodes.

And they are really fully determined,

because we have a policy at the belief state.

They're fully determined by the belief state.

And since the belief state is fully observable,

there is nothing that can go wrong.

It's just the policy, belief state policy,

is just a function we apply.

So then we have the upcoming observations, the evidence.

It's something we don't know about.

So we model that as a set of chance nodes.

That gives us something to systematically search over.

We take a decision, and then the environment,

the next percept comes on.

And we have to plan on for what that might be.

And we discussed about the similarity to minimax here,

except that we're, in this case, not

Teil einer Videoserie :

Artificial Intelligence (AI-2) SS 2021

Teil eines Kapitels:

Recaps

Presenters

Prof. Dr. Michael Kohlhase

Zugänglich über

Offener Zugang

Dauer

00:08:15 Min

Aufnahmedatum

2021-03-30

Hochgeladen am

2021-03-31 11:16:32

Sprache

en-US

Recap: Online Agents with POMDPs

Main video on the topic in chapter 7 clip 6.

Einbetten

Wordpress FAU Plugin

 https://www.fau.tv/clip/id/30442

iFrame

<iframe src="https://api.video.uni-erlangen.de/services/oembed/?url=https://www.fau.tv/clip/id/30442&format=iframe&maxwidth=1280&maxheight=720" width="1280" height="720"seamless allowfullscreen style="border: 0; padding: 0; margin: 0;overflow: hidden;"></iframe>

Herunterladen

Video

Per RSS abonnieren