2 - 24.1. Sequential Decision Problems [ID:30357]
50 von 100 angezeigt

Great. So let's start with some overview of how all of these ideas somewhat relate to

the ideas we've already known. You remember last semester we basically started out with

simple search algorithms that all of you know like depth first, breadth first, A star and

so on and so forth. And in many instances let's say I'm in position A on some map I

want to get to position B, how do I get to B? You can think of that as a decision problem

in the sense of I have to make decisions, do I take that road, do I take that road.

We can kill that problem using search methods only. It's not exactly efficient but we can.

So all of that is basically domain of the search area on the top.

Now from there we can do two things to make things more interesting. The one thing we

can do is we can add explicit actions. I'm actually not sure what's meant by that. But

what we definitely can do is add sub-goals. In that instance the classic search algorithms

are already not really feasible anymore. But if we do that what we end up with is a planning

problem and how to tackle planning problems we talked about last semester. Do you remember

all the strips algorithm and all of that stuff? I think I tortured you a lot with that. No

wait, I tortured you way too little with that, I remember. Anyway, the alternative way we

can go about is we add uncertainty and utility. Well, add utility in some sense we can already

add utilities to search problems. As soon as I have actual costs from going from one node

to another node in a search problem I basically have some sense of utility. You can think

about a greedy algorithm as just trying to maximize the utility of some result. But uncertainty

of course is a different beast. So if we add uncertainty and utility to a search problem

what we end up with is Markov decision problems which we're going to talk about. And from

each of those I can basically, given those two nodes, I can basically do a category theoretic

push out over exactly those two factors I've already had. And if I do so then I get to

decision theoretic planning. I'm actually not sure we're going to talk about that in

this class at all. Probably not. I don't remember. Anyway, decision theoretic planning does exist,

it's a thing. And you can think of it either as taking planning and adding uncertainty

or you can think of it as taking a Markov decision problem and adding sub-goals. And

the one we're going to be most interested in is partially observable Markov decision

problems which you get by taking a Markov decision problems and adding a probabilistically

observable environment basically. So we're just assuming that all of our sensors are

not perfect which is a reasonable assumption of course. And the interesting thing is that

the way we're going to actually solve partially observable Markov decision problems is by

turning them back into classic Markov decision problems just not on physical states but on

our belief states. That's going to be the basic idea. Right. Let's start with an example

again just so we know what kind of situation we're actually talking about. So let's assume

you have this nice little four times three square map quote unquote. We start in the

lower left case square here and we want to ideally end up in the upper right most field

which gives us a reward of plus one and we definitely don't want to end up in this field

which gives us a reward of minus one. We model all of this by letting all of those fields

constitute a set of states and we can think of actions that we can perform at each state

modeled as a set A. Of course what are going to be the actions well either we move up or

we move right or we move left or we move down. We can again build a transition model of that

where we basically just take the probability that some action in S leads to some state

new state as prime. In this instance or in this example we just assume that if I try

to go forward in some direction I actually only end up in the field that I intended to

end up with probability of 80 percent and with 10 percent I end up to the left and with

10 percent I end up to the right. Of course assuming that it's possible to end up there

in the first place. And now we can think about if we want to design an agent that solves

this problem for us in an offline manner we probably want to take some kind of reward

function for the intermediate fields as well. Not just end up here and end up here. For

Teil eines Kapitels:
Chapter 24. Making Complex Decisions

Zugänglich über

Offener Zugang

Dauer

00:11:56 Min

Aufnahmedatum

2021-03-29

Hochgeladen am

2021-03-30 14:47:12

Sprache

en-US

Definition of Sequential Decision Problems and an overview over search. Definitions, solutions and examples for Markov Decision Processes are discussed. 

Einbetten
Wordpress FAU Plugin
iFrame
Teilen