Great. So let's start with some overview of how all of these ideas somewhat relate to
the ideas we've already known. You remember last semester we basically started out with
simple search algorithms that all of you know like depth first, breadth first, A star and
so on and so forth. And in many instances let's say I'm in position A on some map I
want to get to position B, how do I get to B? You can think of that as a decision problem
in the sense of I have to make decisions, do I take that road, do I take that road.
We can kill that problem using search methods only. It's not exactly efficient but we can.
So all of that is basically domain of the search area on the top.
Now from there we can do two things to make things more interesting. The one thing we
can do is we can add explicit actions. I'm actually not sure what's meant by that. But
what we definitely can do is add sub-goals. In that instance the classic search algorithms
are already not really feasible anymore. But if we do that what we end up with is a planning
problem and how to tackle planning problems we talked about last semester. Do you remember
all the strips algorithm and all of that stuff? I think I tortured you a lot with that. No
wait, I tortured you way too little with that, I remember. Anyway, the alternative way we
can go about is we add uncertainty and utility. Well, add utility in some sense we can already
add utilities to search problems. As soon as I have actual costs from going from one node
to another node in a search problem I basically have some sense of utility. You can think
about a greedy algorithm as just trying to maximize the utility of some result. But uncertainty
of course is a different beast. So if we add uncertainty and utility to a search problem
what we end up with is Markov decision problems which we're going to talk about. And from
each of those I can basically, given those two nodes, I can basically do a category theoretic
push out over exactly those two factors I've already had. And if I do so then I get to
decision theoretic planning. I'm actually not sure we're going to talk about that in
this class at all. Probably not. I don't remember. Anyway, decision theoretic planning does exist,
it's a thing. And you can think of it either as taking planning and adding uncertainty
or you can think of it as taking a Markov decision problem and adding sub-goals. And
the one we're going to be most interested in is partially observable Markov decision
problems which you get by taking a Markov decision problems and adding a probabilistically
observable environment basically. So we're just assuming that all of our sensors are
not perfect which is a reasonable assumption of course. And the interesting thing is that
the way we're going to actually solve partially observable Markov decision problems is by
turning them back into classic Markov decision problems just not on physical states but on
our belief states. That's going to be the basic idea. Right. Let's start with an example
again just so we know what kind of situation we're actually talking about. So let's assume
you have this nice little four times three square map quote unquote. We start in the
lower left case square here and we want to ideally end up in the upper right most field
which gives us a reward of plus one and we definitely don't want to end up in this field
which gives us a reward of minus one. We model all of this by letting all of those fields
constitute a set of states and we can think of actions that we can perform at each state
modeled as a set A. Of course what are going to be the actions well either we move up or
we move right or we move left or we move down. We can again build a transition model of that
where we basically just take the probability that some action in S leads to some state
new state as prime. In this instance or in this example we just assume that if I try
to go forward in some direction I actually only end up in the field that I intended to
end up with probability of 80 percent and with 10 percent I end up to the left and with
10 percent I end up to the right. Of course assuming that it's possible to end up there
in the first place. And now we can think about if we want to design an agent that solves
this problem for us in an offline manner we probably want to take some kind of reward
function for the intermediate fields as well. Not just end up here and end up here. For
Presenters
Zugänglich über
Offener Zugang
Dauer
00:11:56 Min
Aufnahmedatum
2021-03-29
Hochgeladen am
2021-03-30 14:47:12
Sprache
en-US
Definition of Sequential Decision Problems and an overview over search. Definitions, solutions and examples for Markov Decision Processes are discussed.