4 - 23.2. Inference: Filtering, Prediction and Smoothing (Part 3) [ID:30352]
50 von 53 angezeigt

Three out of four.

The last one is most likely explanation where we actually want to see what was the real or the most likely explanation.

The thing you should really, again I've said it before, realize is that the sequence of most likely states as you get by filtering for instance,

is not actually the most likely sequence. Say we have the umbrella sequence, true true, false true true,

and the question is what is the most likely weather sequence?

Typical situation an archaeologist finds herself in.

You have some records of whether the director brings umbrellas and then you want to know did it actually rain.

And the important thing is that you have to act on the same, at the sequence level, not at the individual events level.

And the question of course is we're seeing this thing here and we're seeing, ah, no umbrella on Wednesday.

What does that mean? Could be things like the director had, no that doesn't help.

So the director just basically forgot.

Or this true here on Thursday, the director came with a walking stick and I just saw it wrongly, misinterpreted it.

Stuff like that, right? But here we had this, if we just basically look at the data itself then we would have a non-stationary weather sequence.

Whereas our transition model kind of predicts that everything stays as it is.

And what is the most likely thing?

So what we could do is we could basically use smoothing with this sequence, run all the way forward using filtering,

run all the way backwards with smoothing and find out what the most likely states are.

And of course that's not going to work because we're actually not in this process looking for the most likely sequence,

but the sequence of most likely events.

And of course these posterior distributions that we're getting by smoothing is actually, they just range over one time frame.

And this difference matters.

Especially, and you can relatively easily see that by constructing interesting models.

Say you have a deterministic weather model, but a really, really, really bad sensor.

Then of course you can do all kinds of observations, but there are two weather sequences that are possible at all.

One is the one where it's always sunshine and one where there's always rain.

In the model, weather never changes at all.

And so you can, and there you'll see differences directly, so this is something you might want to look at.

So what you can, a good way of thinking of this is that you can basically look at the, that you can look at this, the set of all sequences as the set of all paths through this graph.

It's a kind of a time-wise fully connected graph, whereas where you can map any true-false sequence to a path in here.

Whereas the umbrella sequence, that actually gives you, I've made them fat here to be in sync with my example here, kind of forces you into certain likelihoods in the situation.

So what we want to compute, so the idea here of maximum likelihood estimation is, not maximum likelihood estimation, but most likely explanation, maximum likelihood estimation is another technique.

So what you do is basically you look at a recursive algorithm again, where you basically look for the most likely path, think paths here, up to some xt plus one more step.

Recurse on whole paths here.

And then of course you maximize over the sequences here the probability distributions of that path which is fixed and the full distribution over the next one, given x estimate up to t plus one.

Which is something we can compute, we can pull the maxes in, the first factor which we get from a product in here, where we split this probability distribution into three factors,

and then we can pull in the maxes, basically only taking into account those variables they actually range over.

So the first factor we're getting is independent of the xt's, because it's only this last one, then we get a max over just xt and then we get a max over xt minus one.

And this here you can see is again just a recursive call.

So just by some very simple analysis, we get a recursive procedure.

And it works exactly like filtering, we're moving forward here, but instead of passing the F message, which is the term we're getting for filtering, we get past this message around.

And this algorithm is called the Viterbi algorithm, by the guy who invented it of course.

And if you look at this, we're getting linear time complexity again, but we're also getting linear space complexity, because we have to keep a pointer, identify the most likely path.

And paths are long, they grow with every time step, which actually gives us one component that linearly grows, so it messes up our whole algorithm.

So as I said, this most likely explanation is an important algorithm for, in the sense where you actually, for solving the sensing problem.

And it's used for speech recognition and stuff like that, and similar stuff, where you have a noisy channel, where you have to find out what was actually being said, and what was the most likely explanation for the observation sequence.

And that is, and having a linear space, linear time and space algorithm is a good thing for that.

And of course, we're expecting it to be linearly in time, because kind of, as the sequence grows, we have to process more data.

Unfortunately, we cannot do it in constant space.

You may have heard yourself when somebody actually, when you're hearing something over a noisy channel, and the longer it goes, the more you're tempted to say, oh, wait, wait, wait, wait, can I have that again?

And that's kind of, that's, we don't really know.

Teil eines Kapitels:
Chapter 23. Temporal Probability Models

Zugänglich über

Offener Zugang

Dauer

00:11:54 Min

Aufnahmedatum

2021-03-29

Hochgeladen am

2021-03-30 14:16:44

Sprache

en-US

The concept of Most Likely Explanation gets discussed as well as the Viterbi algorithm. 

Einbetten
Wordpress FAU Plugin
iFrame
Teilen