8 - Kernel Principal Component Analysis [ID:15552]
50 von 400 angezeigt

Hello and welcome everybody for today's lecture. Last week we've discussed the linear principal

component analysis, in short PCA, and essentially we said that whenever we have input data in

some data space, we compute simply the covariance matrix of this data and the eigenvectors of

this matrix or this linear operator and code the variance of our data in respective directions,

and this was essentially the core of the idea of the principal component analysis. In the

end we also saw that the linear principal component analysis has some drawbacks and

disadvantages, especially the computational complexity in case of a high dimension of

the input space, and also that we can only separate linear features, which means that

whenever our data is somehow distributed such that you cannot find a linear hyperplane in

your input space, then the linear PCA will not give you sophisticated features for machine

learning. So today's topic will be a little bit more sophisticated. We will talk about

the kernel PCA, the kernel principal component analysis, and see how we can alleviate some

of these problems. So before we start with the mathematics, let me give you a short motivation

of what we will do today. Let's say we have this kind of artificially introduced data,

which is lying on two circles, and somehow we would like to cluster this data, and we

as a human, we don't have any problems in that. We can see, okay, this is an inner circle

and an outer circle, so we would easily draw two clusters and say this here would be cluster

one, and the outer one would be cluster two, for example. So that's what we humans can

do, but how can we introduce this to a computer or to a machine in terms of mathematics? Well,

if we use our linear PCA on this data in the input space, which is two-dimensional, then

of course what we will see is that we will somehow find a plane in R2 that separates

our data according to the variance. So somehow the features that we get out will introduce

us a hyperplane that spans somewhere in between these two. Maybe let's change the color to

yellow. Does that work? Yes. So this would be our hyperplane, and of course this is not

really satisfying. So what if we somehow would find a transformation out of this R2, which

is maybe not good enough to separate the data into another feature space that helps us to

find a linear separation? So the aim of today is to somehow find... That's a little bit

too big. I'm sorry. I want you to have this one. So we are looking for some kind of map,

psi that maps from our input space Rm to some feature space that allows us to separate the

data. So this is our aim for today. And just one possible example of what we could do with

our circles is, let's imagine that we would go into the third dimension and here our data

would become separable in terms of a hyperplane. And how could this look like? Well, we could

now have a plane somewhere here. This should be a two-dimensional plane lying between these

point sets and separating these. So this would be our goal. And how could we do this? Well,

if we go back one slide and check our two-dimensional data, then of course one possibility would

be to assume that we have a center point somewhere here. And then we're measuring the distance

to the center point from every data point. This would be this distance. And then we take

the square of the distance, which gives us somehow a positive function, which is quadratic

since we're taking the Euclidean distance. And of course, this would induce such a three-dimensional

plot. And here we can find now in a 3D feature space, taking the x, y and the Euclidean distance

as a feature, we can find a separable hyperplane that gives us what we are looking for. Okay.

So our aim is now to find such a transformation psi into some other feature space. And once

we do this, we can compute a PCA in this feature space that separates our data nicely according

to the principal components. And this induces us the clustering that we as a human would

also find satisfying. Okay. So this is the result of the linear PCA in this feature space

after a possibly nonlinear mapping. And the question of today is how can we get there?

Well, first of all, the ideas we would like to transform the given input data x i, and

we have capital N of those, as you remember, via map psi to some higher dimensional vector

space. So higher dimensional is good because that gives us more flexibility to find separating

hyperplanes for classifiers later on. So it really makes sense to go to a feature space

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:44:37 Min

Aufnahmedatum

2020-05-11

Hochgeladen am

2020-05-11 22:16:18

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen