6 - 24.5. Online Agents with POMDPs [ID:30361]

50 von 174 angezeigt

Tell us about the absorbent.

Say, we want to build the real world agent.

An agent which can actually deal with the real world,

which is a POMDP.

What do we do?

Well, one way of doing it is to take decision networks.

Remember decision networks came from

Bayesian networks by adding action and utility nodes.

Remember we had the airport network,

I'm going to abbreviate like that.

What we had, we added an action node,

and we added a utility node to it.

The action was put the airport to Pittsburgh,

or put it in between Nuremberg and Erlangen.

And we had various utilities.

Airport in Pittsburgh, for people here, bad idea.

You have to swim too much.

So the idea is to look at essentially the argmax over the actions here,

maximizing the utility there.

Good.

What we can do at one time point, we can do at many time points.

And the result is something like this.

Remember, dynamic Bayesian networks were networks that essentially

are time-sliced.

That's this part.

Same thing essentially as our example for the umbrellas.

Having Markov property that we always assume gives us limits,

the number of arrows between time-slices,

and having stationary Markov processes

gives us the same conditional probability tables,

and having a Markov sensor,

property basically limits the number of these arrows

and if we make it stationary as well,

tables are the same here so every time slice looks the same. Now we add action

nodes, those things here, and of course we add them to every time slice in the

same way with also making them making it stationary with the same conditional

probability table and we add utility nodes and here's where there's a slight

wrinkle. We're actually not adding only utility nodes but we're also adding

reward nodes and remember utilities is something which we basically have at the

end of the network which kind of sums up all the expected utilities in the end

which is something we have here as well. We have the utility of everything that

happens after t plus 3 here. Before t plus 3 we actually have rewards and if

we know, which we don't of course but we have beliefs about, what the state of the

world is we know what the reward is okay. So in the essentially finite times we

add rewards and we sum up the expected utility of the things we haven't

unrolled in this dynamic base network with a utility node here and so

what you what you want to see here is that there are some things we know

let's say we have we are in Xt here, that's the current time, and we've

we've basically projected out so that we can make

search over it for three time steps in this example.

Teil einer Videoserie :

Artificial Intelligence (AI-2) SS 2021

Teil eines Kapitels:

Chapter 24. Making Complex Decisions

Presenters

Prof. Dr. Michael Kohlhase

Zugänglich über

Offener Zugang

Dauer

00:21:57 Min

Aufnahmedatum

2021-03-29

Hochgeladen am

2021-03-30 15:16:30

Sprache

en-US

How to build an Online Agent with POMDPs, dynamic Bayesian network and dynamic decision network. Also, a short summary of this chapter.

Einbetten

Wordpress FAU Plugin

 https://www.fau.tv/clip/id/30361

iFrame

<iframe src="https://api.video.uni-erlangen.de/services/oembed/?url=https://www.fau.tv/clip/id/30361&format=iframe&maxwidth=1280&maxheight=720" width="1280" height="720"seamless allowfullscreen style="border: 0; padding: 0; margin: 0;overflow: hidden;"></iframe>

Herunterladen

Video

Per RSS abonnieren