Tell us about the absorbent.
Say, we want to build the real world agent.
An agent which can actually deal with the real world,
which is a POMDP.
What do we do?
Well, one way of doing it is to take decision networks.
Remember decision networks came from
Bayesian networks by adding action and utility nodes.
Remember we had the airport network,
I'm going to abbreviate like that.
What we had, we added an action node,
and we added a utility node to it.
The action was put the airport to Pittsburgh,
or put it in between Nuremberg and Erlangen.
And we had various utilities.
Airport in Pittsburgh, for people here, bad idea.
You have to swim too much.
So the idea is to look at essentially the argmax over the actions here,
maximizing the utility there.
Good.
What we can do at one time point, we can do at many time points.
And the result is something like this.
Remember, dynamic Bayesian networks were networks that essentially
are time-sliced.
That's this part.
Same thing essentially as our example for the umbrellas.
Having Markov property that we always assume gives us limits,
the number of arrows between time-slices,
and having stationary Markov processes
gives us the same conditional probability tables,
and having a Markov sensor,
property basically limits the number of these arrows
and if we make it stationary as well,
tables are the same here so every time slice looks the same. Now we add action
nodes, those things here, and of course we add them to every time slice in the
same way with also making them making it stationary with the same conditional
probability table and we add utility nodes and here's where there's a slight
wrinkle. We're actually not adding only utility nodes but we're also adding
reward nodes and remember utilities is something which we basically have at the
end of the network which kind of sums up all the expected utilities in the end
which is something we have here as well. We have the utility of everything that
happens after t plus 3 here. Before t plus 3 we actually have rewards and if
we know, which we don't of course but we have beliefs about, what the state of the
world is we know what the reward is okay. So in the essentially finite times we
add rewards and we sum up the expected utility of the things we haven't
unrolled in this dynamic base network with a utility node here and so
what you what you want to see here is that there are some things we know
let's say we have we are in Xt here, that's the current time, and we've
we've basically projected out so that we can make
search over it for three time steps in this example.
Presenters
Zugänglich über
Offener Zugang
Dauer
00:21:57 Min
Aufnahmedatum
2021-03-29
Hochgeladen am
2021-03-30 15:16:30
Sprache
en-US
How to build an Online Agent with POMDPs, dynamic Bayesian network and dynamic decision network. Also, a short summary of this chapter.