There we go.
Okay.
Before we start, I want to get into one of the quiz questions.
Because I spent the week and double and triple checking that it's correct and it annoyed
me extremely, so now you have to suffer with me.
Remember that in a hidden Markov model or more precisely basically any single variable
Markov chain, we have our transition matrix which is defined as Tij being the probability
of Xt being j given Xt minus 1 being i.
Which means if we want to compute the distribution for Xt via marginalization over Xt minus 1,
we have to do that by first transposing the matrix and then multiplying with the distribution
over T minus 1.
And I double and triple check that that's correct and I double and triple check that
the definition of the transition matrix is correct because this is obviously very annoying.
This is the primary purpose for why we have a transition matrix in the first place.
So why the hell do we define it such that we have to transpose it every time we use
it basically.
Someone else asked the same question last week and I answered and I said because that's
the convention with Markov chains which is correct and entirely unhelpful.
So I went down a rabbit hole to find out why and the explanation is not very satisfying
either because it's basically because in the context of Markov chains we have the convention
of considering distributions to be row vectors instead of column vectors.
Which means we can compute this as this times T.
Still not very satisfying.
Why the hell would they use row vectors instead of column vectors?
So we're back to the same question and the reason I think I'm now guessing is let's assume
we have a non-stationary process.
So we have this matrix at every time step T.
And okay fair enough.
XT equals P XT minus one times T.
Let's add one more step.
Okay let's add one more step just for fun.
So the best explanation that I can come up with is basically if we do it this way then
here I have T, here I have T plus one, here I have T plus two and then here I would get
T plus three if I want to do one more step and so on and so forth.
If we work with column vectors we get obviously the other way around.
So P of XT minus one times TT times T plus one times T plus two and so on and so forth.
In other words the way that we do it and that every sensible person who uses vectors as
column vectors would do it time flows from right to left which is a bit annoying.
And I think the convention to use row vectors instead and use this matrix is just so that
time flows from left to right if I do these kind of products.
That's the best explanation I can come up with.
Does that make sense?
So to recap on what we did last week I fixed the slides finally so the algorithm is now
correct.
So what does Viterbi do?
First we start with the prior distribution over X0 as our m vector then we iterate over
all time steps onto time T. Keep in mind the idea here is that we want to compute the most
likely explanation for a given sequence and a given observation, sequence of observations.
We iterate from one over T, we compute the next value for m.
We remember for every element of the domain of our random variable the one where this
Presenters
Zugänglich über
Offener Zugang
Dauer
01:14:36 Min
Aufnahmedatum
2024-05-28
Hochgeladen am
2024-05-31 01:09:03
Sprache
en-US