38 - Pattern Recognition [PR] - PR 34 [ID:23885]
50 von 112 angezeigt

Welcome back to Pattern Recognition. So today we want to explore a little bit more ideas about

the independent component analysis ICA and we've seen so far that the Gaussianity or non-Gaussianity

is an important property of independent components. So in today's video we want to look into different

measures how to actually determine this non-Gaussianity.

So we've seen that the key principle in estimating the independent components is

non-Gaussianity. In order to optimize the independent components we need a quantitative

measure of non-Gaussianity. So furthermore let's consider our random variable Y and assume that

it has zero mean and unit variance and of course we enforce this already by appropriate pre-processing.

Now we will consider three measures of non-Gaussianity and we'll look into the

kurtosis, the neck entropy and mutual information. Let's start with the kurtosis. The definition of

kurtosis is the expected value of Y to the power of 4 minus 3 times the square of the expected value

of the power of 2 of the original signal. If we have unit variance and zero mean you can see that

this simplifies to the signal to the power of 4 and then you subtract 3 because we simply have

the covariance matrix as the identity matrix and then this essentially gives us just a factor of 3.

Now if you have two independent random variables Y1 and Y2 then linearity properties hold which

means that the kurtosis of Y1 plus Y2 is going to be given as the kurtosis of Y1 and the kurtosis

of Y2 and also a scaling with a factor of alpha would then result in the kurtosis of Y multiplied

with the factor alpha to the power of 4 and now here alpha is a scalar value. Let's have a look

at the kurtosis for a Gaussian distribution. The nth central moment of a Gaussian distribution with

P of Y equals to the normal distribution with mean Y and variance sigma square can then be

determined as the expected value of Y minus mu to the power of n and you will see that this is

going to be n minus 1 double factorial times sigma to the power of n if n is even and zero if n is

odd. So for your zero mean and unit variance random variable Y that is normally distributed

we will have a kurtosis of zero. So the kurtosis is zero for a Gaussian random variable with zero

mean and unit covariance. For most but not all non-Gaussian random variables the kurtosis is

non-zero. So the kurtosis can also be positive or negative and typically then the non-Gaussianity

is measured as the absolute value of the kurtosis or the kurtosis to the power of 2. Let's look into

a sub-Gaussian probability density function. So here we choose the uniform distribution and you

can see that in this case the kurtosis is going to be negative. If we have a super Gaussian probability

density function for example we take the Laplacian distribution then you will see that the kurtosis

is greater than zero. If you consider the 2D case using a linear combination and then you can see

that we can express our Y as W transpose X. Now we replace X with the mixing matrix and the original

signals then we are able to rewrite the weighting vector as this inner product with Z and S and if

we have two variables we can write this as Z1 times S plus Z2 times S. Then the kurtosis of Y would

be given as the kurtosis of Z1 times S1 plus the kurtosis of Z2 times S2 and this can be rewritten

using our scalar property as Z1 to the power of 4 times the kurtosis of S1 plus Z2 to the power of 4

times the kurtosis of S2. So as Y also has a unit variance concerning S1 and S2 we can now write up

the expected value of Y square and you see that this is going to be given as Z1 to the power of 2

plus Z2 to the power of 2 and this is supposed to be 1 because of our scaling. So this constrains

our Z to the unit circle in the 2D plane. Now we have to find the maximum of the function on the

unit circle with respect to Z. So the absolute value of the kurtosis is given by the absolute

value of the reformulated kurtosis with respect to the two signals and here we have a couple of

examples for the landscape of the kurtosis in a 2D plane. So here the thick curve is the unit circle

and then the thin curves are isocontourist of the objective function. So you see that the maxima are

located at sparse values of Z and for example you find them where Y is plus minus SI. So how would

we maximize the non-gaussianity of a vector W in practice? You start with some initial vector W,

then you use gradient descent to optimize the maximization of the absolute value of the kurtosis

and of course you want to do that after transforming with W. So you plug this in here. Then you plug

this optimization into the ICA estimation algorithm that we've seen in the previous video. Let's

visualize the kurtosis as a function of the direction of the projection and here you can

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:15:14 Min

Aufnahmedatum

2020-11-16

Hochgeladen am

2020-11-16 08:09:03

Sprache

en-US

In this video, we discuss three measures to determine "non-gaussianity".

This video is released under CC BY 4.0. Please feel free to share and reuse.

For reminders to watch the new video follow on Twitter or LinkedIn. Also, join our network for information about talks, videos, and job offers in our Facebook and LinkedIn Groups.

Music Reference: Damiano Baldoni - Thinking of You

Einbetten
Wordpress FAU Plugin
iFrame
Teilen