33 - Pattern Recognition [PR] - PR 29 [ID:23769]
50 von 126 angezeigt

Welcome back everybody to pattern recognition. So today we want to look into a probabilistic

estimation technique that allows us to estimate hidden information, the so-called expectation

maximization algorithm.

So let's look into our slides. So the topic today is the expectation maximization algorithm

and we want to use it for parameter estimation. So the goal is the derivation of a parameter

estimation technique that will be able to deal with high dimensional parameter spaces,

latent and hidden variables as well as incomplete data. So far we've seen the following techniques.

We have seen the maximum likelihood estimation which is essentially assuming that all observations

are mutually statistically independent, the observations are kept fixed and then the log

likelihood function is optimized with respect to the parameters. The other technique that we looked

into was the maximum a posteriori estimation and there we essentially assume that there is a

probability density function of our parameters and in order to do it of course we have to know

how this function looks like in order to determine its parameters. So generally we can now formulate

this as x as an observed random variable and some parameter set theta. So the estimates of theta are

denoted by theta hat and x can then be an event that is assigned to the random variable capital x.

This gives us then the maximum likelihood estimation where we have the maximum over our likelihood

function that then determines the optimal parameters theta hat and typically we can write this up as a

joint probability function. Now the other one is the maximum a posteriori estimation and here we

determine our parameter theta hat as the maximization over the probability of theta given some x and this

can then be written by the base rule and marginalization such that we then finally end up as the maximization

of the logarithm of the prior of theta plus the logarithm of x given theta. Now we essentially

have a generative model where we know the distribution of x given theta and we are only

interested in determining the optimal parameters. So theta is considered as a random variable

and its probability density function is known. Let's look into an example using maximum likelihood

estimation. So here we assume a Gaussian and you remember the Gaussian and the normal distribution

are of course very relevant also for the exam. So you see that we have this classical formulation

as it popped out at various instances in our class and then we observe random vectors x1 to xm that

form our training data. So based on these training data we now have to estimate the mean vector mu

and the covariance matrix sigma. How can we do that? Well we use the idea of maximum likelihood

estimation and we assume that our samples are mutually independent which essentially allows

us to write everything up as a large product and this large product is then maximized with respect

to mu and sigma. We can essentially use the log likelihood function and if we do so then we can

even pull in the logarithm our product turns into a sum and this then allows us to essentially also

write this up generally as the maximization of the log likelihood function where we then have the

observations as input and we seek to maximize it with respect to our mu and sigma.

And here is our definition of the log likelihood function that is simply the sum

over the respective logarithmized probabilities. If you do so then you have to compute the partial

derivative in order to compute the point of optimality and necessary conditions for these

parameters are of course that the partial derivative with respect to mu equals to zero and the partial

derivative with respect to sigma equals to zero. So we can actually do that, apply the definition

of our Gaussian and then we see that the prior term essentially can be removed because it's not

relevant for the optimization with respect to mu here and then we essentially just remain with the

exponent and this one we can then set to zero and then we can also see if we solve above equation

then we essentially get the sum over all the observations divided by the number of observations

as the mean vector. Now we can also estimate the covariance matrix and we do the same trick we

compute the partial derivative with respect to the log likelihood function and then we determine

its zero crossings and we can see that the covariance matrix then can be estimated

as one over m where m is the number of training samples and then we have two vectors essentially

and it's actually twice the same vector it's x i minus mu hat times x i minus mu hat transpose

so these are both column vectors so we essentially have degree one matrices and then a sum over all

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:16:16 Min

Aufnahmedatum

2020-11-14

Hochgeladen am

2020-11-14 22:37:37

Sprache

en-US

In this video, we have a look at the expectation maximization algorithm using Gaussian Mixture Models.

This video is released under CC BY 4.0. Please feel free to share and reuse.

For reminders to watch the new video follow on Twitter or LinkedIn. Also, join our network for information about talks, videos, and job offers in our Facebook and LinkedIn Groups.

Music Reference: Damiano Baldoni - Thinking of You

Einbetten
Wordpress FAU Plugin
iFrame
Teilen