Hello everybody to the Tuesday session. We teach as usual in English because we have
one native, non-native speaker here. Today we have on our list, first of all, we want
to finalize the EM algorithm and then I would like to talk about the hidden Markov models
and we're going to look into the training formulas of hidden Markov models, different
types of hidden Markov models, and then we will also address the key questions that are
always discussed in the context of hidden Markov models. So we know the EM algorithm.
How does the EM algorithm work? The EM algorithm is an iterative scheme that more or less emulates
a maximum likelihood estimation. That was the core idea and if this is the likelihood
function the EM is iteratively maximizing a function like that and ends up here in the
maximum. If the initialization was done properly such that the first guess was in the area
of attraction of the global maximum. What is the area of attraction? The area of attraction
is the set of all points where the gradient points towards the global maximum. Welcome!
Do you know the company BrainLab? BrainLab is dealing with brains and it's very interesting
and it's very interesting to see what they are doing and we will visit them on July 27th
and if you want you can join us and we will travel to Munich and listen to a company presentation
over there. Do you want to attend? I don't know right now but I want. Good. So that's
the EM algorithm and the EM algorithm needs to characterize this function here. This function
is as expected an expectation. This is the Q function that we see over there that is
iteratively maximized. The Q function is defined how? Well you can remember the formula. You
can also say well the EM algorithm is dealing with the missing information principle meaning
complete information or the observable information is complete information minus hidden information
and I know the complete stuff over there that's more or less the core for the Q function.
You multiply both sides with P of Y given blah blah blah X and so on and then you integrate
out the hidden variable Y. That's the Q function. And then you iteratively maximize the Q function
and you know in the Q function there is a logarithm and the logarithm usually splits
up the product of probabilities into sums of probabilities and this property leads to
the fact that many optimization problems can be decomposed into sub space optimization
problems right due to the fact that the logarithm breaks up the product into a sum and you compute
the gradient those variables that are independent of the variables you derive the function they
more or less vanish and disappear. That's the core of the EM algorithm. We know it's
slow. The EM algorithm is very slow and it's a local method but for learning systems doesn't
matter. I mean if you have a speech recognizer and you want to train it using some training
samples you can do that over overnight. So the parameters are adapted through a nightrun
and then you use the system for classification. Good. So that's the EM algorithm. And last
time oh we did so much so fun stuff here. Very nice. And then we discussed hidden Markov
models. We discussed hidden Markov models where we have states where we have input probabilities
P1 P2 P3 and we have transition probabilities A11 A12 and so on. So we have basically three
stochastic processes combined in this stochastic automator. Which stochastic processes do we
have? We have the probabilities that are used for starting things so P1 P2 P3. That's the
initial probability when we start. Then we have the transition probabilities and we have
the output probabilities. Okay. And we have the output probabilities. Some people say
this is just a combination of two statistical processes and they introduce a state as zero
and start like that and say here with probability one I start here and then I go up here up
here up here and this is producing no output. But that doesn't matter at all. For us P1
P2 P3 are the starting or the initial probabilities for the states. And now I can use this hidden
Markov model. Let me just copy it. How can I copy? Let me just introduce a new page.
So let's look at one example. Here we have S1. Here we have S2. Here we have the transition
probabilities A21 A12 A11 A22 Pi1 and Pi2. Why is it called hidden? Because we have a
wall here. Like with our girls example. We have here a wall and I don't see the automata
from the outside. So this is a wall. It's a brick wall. That's a brick wall. That's
Presenters
Zugänglich über
Offener Zugang
Dauer
00:35:44 Min
Aufnahmedatum
2009-07-07
Hochgeladen am
2012-07-30 10:19:36
Sprache
en-US