5 - 24.4. Partially Observable MDPs [ID:30360]
50 von 177 angezeigt

Then I would like to go to partially observable MDPs.

The last thing we're going to add that makes our life horrible as an agent, but makes us

perform better as an agent.

Because we're adding noisy sensors, essentially.

So in our example, that's relatively simple.

We still take the same environment, same kind of rewards, all of those kind of things.

But instead of always knowing exactly where the walls are.

If I'm here, then my sensor says, there's a wall to the north, there's a wall to the

south.

Now we're adding just one thing.

Maybe.

I think there's a wall here, and I think there's a wall there.

But I might be wrong.

That's the big new improvement.

If you think of this as a maze, just add some fog, or dark glasses, or something like this.

You're not really sure.

What we need is to add essentially a non-deterministic sensor model.

Which is something we know how to do.

So we've already done that.

We've already looked at that.

In our umbrella example, that was exactly what we had.

We had a sensor model where we had the hidden variables, namely what's the weather outside,

is it raining or shining.

And we have the observable outcomes, namely whether the director brings an umbrella or

not.

And the director is unreliable.

Same situation here.

Where we have a state, and that's hidden.

We don't know what the state of the world is.

We have some certain beliefs of that.

But we also have evidence.

The evidence, and that's very important to see, is something we can accurately see.

In the umbrella example, we know, we're assuming that we actually can accurately sense whether

the director brings the umbrella or not.

It's only the director that's unreliable.

If my eyes are unreliable, we can always factor that into the probabilities of the director.

So what we're really looking at is what's the evidence, what's the probability of seeing

a certain evidence given a state.

Or if we know enough priors via Bayes' Laws the other way around.

So what we have in our example is we have a partial or a noisy sensor, which is essentially

the same.

So we had a sensor that counts the number of adjacent walls.

And we're giving it an error of 0.1.

By the way, we had a 0.1 error for the action uncertainty.

That's just a coincidence to make the numbers easy.

We could have had, there's nothing that ties the accuracy of our actions to the accuracy

of our sensors.

So the immediate consequence here is that the agent does not know which state they're

in.

It can't reliably say so.

Teil eines Kapitels:
Chapter 24. Making Complex Decisions

Zugänglich über

Offener Zugang

Dauer

00:20:42 Min

Aufnahmedatum

2021-03-29

Hochgeladen am

2021-03-30 15:17:25

Sprache

en-US

Definition of partial observability and how to filter at the belief state level. 

Einbetten
Wordpress FAU Plugin
iFrame
Teilen