11 - Artificial Intelligence II [ID:52583]

50 von 640 angezeigt

There we go.

Okay.

Before we start, I want to get into one of the quiz questions.

Because I spent the week and double and triple checking that it's correct and it annoyed

me extremely, so now you have to suffer with me.

Remember that in a hidden Markov model or more precisely basically any single variable

Markov chain, we have our transition matrix which is defined as Tij being the probability

of Xt being j given Xt minus 1 being i.

Which means if we want to compute the distribution for Xt via marginalization over Xt minus 1,

we have to do that by first transposing the matrix and then multiplying with the distribution

over T minus 1.

And I double and triple check that that's correct and I double and triple check that

the definition of the transition matrix is correct because this is obviously very annoying.

This is the primary purpose for why we have a transition matrix in the first place.

So why the hell do we define it such that we have to transpose it every time we use

it basically.

Someone else asked the same question last week and I answered and I said because that's

the convention with Markov chains which is correct and entirely unhelpful.

So I went down a rabbit hole to find out why and the explanation is not very satisfying

either because it's basically because in the context of Markov chains we have the convention

of considering distributions to be row vectors instead of column vectors.

Which means we can compute this as this times T.

Still not very satisfying.

Why the hell would they use row vectors instead of column vectors?

So we're back to the same question and the reason I think I'm now guessing is let's assume

we have a non-stationary process.

So we have this matrix at every time step T.

And okay fair enough.

XT equals P XT minus one times T.

Let's add one more step.

Okay let's add one more step just for fun.

So the best explanation that I can come up with is basically if we do it this way then

here I have T, here I have T plus one, here I have T plus two and then here I would get

T plus three if I want to do one more step and so on and so forth.

If we work with column vectors we get obviously the other way around.

So P of XT minus one times TT times T plus one times T plus two and so on and so forth.

In other words the way that we do it and that every sensible person who uses vectors as

column vectors would do it time flows from right to left which is a bit annoying.

And I think the convention to use row vectors instead and use this matrix is just so that

time flows from left to right if I do these kind of products.

That's the best explanation I can come up with.

Does that make sense?

So to recap on what we did last week I fixed the slides finally so the algorithm is now

correct.

So what does Viterbi do?

First we start with the prior distribution over X0 as our m vector then we iterate over

all time steps onto time T. Keep in mind the idea here is that we want to compute the most

likely explanation for a given sequence and a given observation, sequence of observations.

We iterate from one over T, we compute the next value for m.

We remember for every element of the domain of our random variable the one where this

Teil einer Videoserie :

Artificial Intelligence II

Presenters

Dr.-Ing. Dennis Müller

Zugänglich über

Offener Zugang

Dauer

01:14:36 Min

Aufnahmedatum

2024-05-28

Hochgeladen am

2024-05-31 01:09:03

Sprache

en-US

Einbetten

Wordpress FAU Plugin

 https://www.fau.tv/clip/id/52583

iFrame

<iframe src="https://api.video.uni-erlangen.de/services/oembed/?url=https://www.fau.tv/clip/id/52583&format=iframe&maxwidth=1280&maxheight=720" width="1280" height="720"seamless allowfullscreen style="border: 0; padding: 0; margin: 0;overflow: hidden;"></iframe>

Herunterladen

Video

Audio

Per RSS abonnieren