21 - Pattern Recognition (PR) [ID:2650]
50 von 291 angezeigt

The following content has been provided by the University of Erlangen-Nürnberg.

So welcome everybody to Pattern Recognition. Welcome back in 2013. Maybe you were hoping

that Professor Honecker will show up again. I have to disappoint you. I think for the

next two weeks he will not show up. So I will give the lecture today and tomorrow and I

hope that Andreas Meyer will give the lecture next week. So today we are talking about one

of the most exciting algorithms in pattern recognition. It's about parameter estimation

and the algorithm is called EM algorithm, expectation maximization. We will first talk

about parameter estimation in general. There are two well-known techniques in statistics,

maximum likelihood estimation and maximum a posteriori maximization map. And then we

will apply it to Gaussian mixture models. First we will apply it to a simple Gaussian

distribution and then to Gaussian mixture models. Then we will focus on the missing

information principle, derive the EM algorithm, show the advantages and drawbacks of this

algorithm and apply it to constraint optimization. So this algorithm was developed with one goal

or with two goals. The first goal is that we want to deal with the high dimensional

parameter space, so many parameters that we want to estimate and we want to deal with

latent or hidden data or sometimes it's also called incomplete data. So that's data that

you cannot observe hidden. For example, if we have a Gaussian mixture model, we have

a mixture of several Gaussian distributions and we have one probability density function

and according to this PDF we generate data. So we see the data, the data is observable,

what we don't see is what component actually generated this data, which of the mixtures

generated it. So from statistics there are two well-known techniques, maximum likelihood

estimation, ML estimation and the other one is maximum a posteriori estimation. We will

focus on the first one, maximum likelihood estimation. We set up a likelihood function

and we maximize this likelihood function with respect to the parameters. The observations

are kept fixed and we assume that all our observations are mutually statistically independent.

What does this mean? So let's say we have the probability p of all our samples. Our

samples are called x1, x2 and let's say we have m different samples and we want to compute

the probability that this set of sample is generated and the probability density function

depends on certain parameters and I call this set of parameters theta. Okay? And that's

what we want to maximize and we choose these parameters such that the output probability

of all our samples is maximized. So we are looking for those parameters that give the

highest probability. So our estimate for the parameters that we are looking for is arc

max of this probability. And now we say our samples are statistically independent. That

means the probability of one of the samples does not depend on all the other samples.

So I can write this probability in another way. This is again arc max and that's the

probability of a single sample xi given the parameters and if I want to get the probability

of all samples I compute the product over all samples. That means statistically independent.

And if you compute this probability or you want to maximize it and you want to find the

position of the maximum it doesn't matter whether you apply a monotonic function or

not. So you can take the logarithm of it. You change the value of the maximum but you

don't change the position of the maximum. So you just use the logarithm of this product

and you still get the same estimate for the parameters. And the benefit of taking the

logarithm is that you can get the same result as the other samples. So you can get the same

result as the other samples. So you can get the same result as the other samples.

And the benefit of taking the logarithm is that you can now put the logarithm inside

the product and instead of having a product you have a sum. And handling sums is in general

much easier than handling products. Okay. And this function here is called log like

likelihood function. So this one here. The other technique maximum a posteriori estimation

assumes that we have a probability density function of our parameters. So we know what

values for our parameters are more likely than other values. And we can treat the parameters

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:51:07 Min

Aufnahmedatum

2013-01-07

Hochgeladen am

2013-01-10 13:47:33

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen