Okay, there we are. So the current topic are Markov models in various ways, Markov processes,
hidden Markov models and so on and so forth. We talked about the four primary tasks of
inference that we have in these kinds of models, namely filtering, i.e. figuring out the current
state of the world given the evidence so far, prediction estimating the future state of the
world given available evidence so far, smoothing, figuring out the most likely state of the world
at some point in the past given all the evidence so far and finally most likely explanation,
i.e. figuring out the most likely sequence of states the world has been in given all evidence
so far. In general, we assume that we have a bunch of state variables X and a bunch of
evidence variables E that jointly form a Markov chain, i.e. we have the first order Markov property,
we have the sensor Markov property, i.e. we assume that all of the variables E at every time slice T
only depend on the state variables at that particular point in time and in the special case where
we have a single state variable X and a single evidence variable E, we have a hidden Markov
model and then if we assume stationarity as well, i.e. that the transition model is the same at
every time slice, then we can use the matrix forms to represent all of the equations for these
kinds of tasks. In general, our goal, especially for filtering, is to find a way to compute the
distribution of the current state given the evidence so far in a recursive manner such that
we can basically at every time step update our world model and then iterate over all of the time
slices, i.e. we want a recursive function F that just takes the latest percept ET and the previous
distribution at the previous time step. Then in matrix form for a stationary hidden Markov model,
this is going to look like this, i.e. we basically just have a matrix product of the previous
distribution represented as a vector with the transition matrix transposed and the
observational matrix at that particular time step. Here's the derivation of the whole thing
and that gives us exactly such a recursive function that we want. In a stationary hidden
Markov model, we get this matrix representation. Here it is again, so observational matrix,
transposed transition matrix, recursive call of the previous distribution at the previous time step.
I think it makes sense to consider what happens if we drop some of the properties of a stationary
hidden Markov model. For example, if we look at this equation here or at this formula here,
what would we have to change about this if we assume that the hidden Markov model is not stationary?
What do we need to change then? Yes? Yes, in what way would we need to change the transition matrix?
Right, it wouldn't be the same for every t, so we would have to add an index t here basically.
So we would have a different transition matrix at every single inference step. Okay, so this is for
a non-stationary HMM, then we would have a time index t. What happens if we don't have a hidden
Markov model? I.e. we have more state variables and more evidence variables.
Sorry? Sorry? T based on some index, that's what we already get if we just drop stationarity.
Or I misunderstood. Yes? That would be one way, yes. We could introduce a transition
matrix and an evidence matrix for every element of the domain of the, sorry, for every single one
of those x t and for every single one of those o t, which basically just means we just stop using
matrices at all and just use this formula, the more general one here. Yeah?
Exactly, so the x t minus one would have to have all of the, sorry, can you repeat that? I think I misunderstood.
Ah, in here we would instead of x t minus one, we would have from one to x t minus one. Not quite, that is what we would get if we drop the Markov property.
Right? So if we assume the first order Markov property, we only need to care about this particular state.
If we drop the Markov property altogether, we would have to add all of the x from one to t minus one on the right side of that conditional probability.
We're still interested in the particular state at time t, but then we exploit this in this derivation. Where do we exploit it?
Hello? Oh, basically in the first line already. No. Here. Yeah, here in the one, two, three, fourth row where we only marginalize over the previous time step.
Right? So here we do marginalization over x t minus one.
And if we drop the Markov property, we would have to add all of the x from one to t minus one in here, which gives us a giant sum over all previous time steps.
So that would be if we drop the Markov property. If we introduce additional state variables and additional evidence variables, we would basically just have to like,
add all of them in here and in here. So instead of just having a single e and a single x, we would have a sequence of those variables.
By abuse of notation, that's basically what's already here. If you just think of every x t or e t or x t minus one as a sequence of variables,
then this formula does not change at all. It's just that we can't do that.
Presenters
Zugänglich über
Offener Zugang
Dauer
01:20:40 Min
Aufnahmedatum
2024-05-23
Hochgeladen am
2024-05-23 19:39:02
Sprache
en-US