Hello and welcome everybody for today's lecture. Last week we've discussed the linear principal
component analysis, in short PCA, and essentially we said that whenever we have input data in
some data space, we compute simply the covariance matrix of this data and the eigenvectors of
this matrix or this linear operator and code the variance of our data in respective directions,
and this was essentially the core of the idea of the principal component analysis. In the
end we also saw that the linear principal component analysis has some drawbacks and
disadvantages, especially the computational complexity in case of a high dimension of
the input space, and also that we can only separate linear features, which means that
whenever our data is somehow distributed such that you cannot find a linear hyperplane in
your input space, then the linear PCA will not give you sophisticated features for machine
learning. So today's topic will be a little bit more sophisticated. We will talk about
the kernel PCA, the kernel principal component analysis, and see how we can alleviate some
of these problems. So before we start with the mathematics, let me give you a short motivation
of what we will do today. Let's say we have this kind of artificially introduced data,
which is lying on two circles, and somehow we would like to cluster this data, and we
as a human, we don't have any problems in that. We can see, okay, this is an inner circle
and an outer circle, so we would easily draw two clusters and say this here would be cluster
one, and the outer one would be cluster two, for example. So that's what we humans can
do, but how can we introduce this to a computer or to a machine in terms of mathematics? Well,
if we use our linear PCA on this data in the input space, which is two-dimensional, then
of course what we will see is that we will somehow find a plane in R2 that separates
our data according to the variance. So somehow the features that we get out will introduce
us a hyperplane that spans somewhere in between these two. Maybe let's change the color to
yellow. Does that work? Yes. So this would be our hyperplane, and of course this is not
really satisfying. So what if we somehow would find a transformation out of this R2, which
is maybe not good enough to separate the data into another feature space that helps us to
find a linear separation? So the aim of today is to somehow find... That's a little bit
too big. I'm sorry. I want you to have this one. So we are looking for some kind of map,
psi that maps from our input space Rm to some feature space that allows us to separate the
data. So this is our aim for today. And just one possible example of what we could do with
our circles is, let's imagine that we would go into the third dimension and here our data
would become separable in terms of a hyperplane. And how could this look like? Well, we could
now have a plane somewhere here. This should be a two-dimensional plane lying between these
point sets and separating these. So this would be our goal. And how could we do this? Well,
if we go back one slide and check our two-dimensional data, then of course one possibility would
be to assume that we have a center point somewhere here. And then we're measuring the distance
to the center point from every data point. This would be this distance. And then we take
the square of the distance, which gives us somehow a positive function, which is quadratic
since we're taking the Euclidean distance. And of course, this would induce such a three-dimensional
plot. And here we can find now in a 3D feature space, taking the x, y and the Euclidean distance
as a feature, we can find a separable hyperplane that gives us what we are looking for. Okay.
So our aim is now to find such a transformation psi into some other feature space. And once
we do this, we can compute a PCA in this feature space that separates our data nicely according
to the principal components. And this induces us the clustering that we as a human would
also find satisfying. Okay. So this is the result of the linear PCA in this feature space
after a possibly nonlinear mapping. And the question of today is how can we get there?
Well, first of all, the ideas we would like to transform the given input data x i, and
we have capital N of those, as you remember, via map psi to some higher dimensional vector
space. So higher dimensional is good because that gives us more flexibility to find separating
hyperplanes for classifiers later on. So it really makes sense to go to a feature space
Presenters
Zugänglich über
Offener Zugang
Dauer
00:44:37 Min
Aufnahmedatum
2020-05-11
Hochgeladen am
2020-05-11 22:16:18
Sprache
en-US