4 - Michaël Fanuel: Diversity sampling in kernel method [ID:14964]
50 von 459 angezeigt

Okay, thank you very much for the nice introduction and your invitation.

It's a great pleasure to do this seminar, especially after the very good talk of Lorenzo

Rosasco.

So, I'll elaborate a bit more on sampling methods in kernel methods, I mean about sampling

approximations related to NISTRON methods, and hopefully I think we can give also a slightly

different view on it.

So also from a more practical perspective from time to time.

So yeah, so it's a joint work between myself, my collaborator Joachim Schreurz, and my advisor

Joanne Serkens, and it's available on archive.

So let me start with giving a brief outline of the talk.

So I'll basically explain basics of kernel methods.

So to set basically to set the definitions and the notations, and also explain a bit

more about regularization.

Then I'll review three sampling methods which are used in the context of NISTRON approximation.

So I'm very happy also that NISTRON approximation was explained in a pedagogical way in previous

talks, so you will understand maybe a bit better now.

So the three sampling methods are uniform sampling, which is of course one of the simplest

in practice, so very scalable, but which let's say it's data points which are sampled independently.

So yeah, the second method is a refined method which was introduced earlier in some of the

paper that Lorenzo already mentioned, which so-called rich leverage score sampling.

So it's an improvement of uniform sampling.

It's more difficult to calculate those leverage scores.

So there is a lot of very good approximation scheme for calculating those leverage scores.

And it's basically sampling points according to a certain score, which gives you how unique

the point you sample is.

So I'll explain a bit better in the sequel.

And so you still sample points in an independent way, each of them, I mean, independently from

the others.

And so the third method that I also discuss a bit is determinantal point processes.

So it's even more difficult to sample a good subset from, I mean, difficult in the sense

of computationally expensive.

So it's more difficult to sample a subset from a DPP, but it's not anymore independent

sampling.

So the samples, they come but with some repulsion between the different points.

So that somehow can be interesting in certain contexts that I will explain.

And you know, it's often called this determinantal sampling, diversity sampling by this repulsion

property.

So I'll explain a bit the link between diversity of the sample and the implicit regularization

that it brings into the learning problem.

And this will be illustrated by theoretical results, but I mean, theoretical identities,

but also some illustrations on public data set that are in this UCI repository.

So we'll discuss the Nishtrom approximation error, kernel PCA and the kernel ridge regression.

So let me first introduce some notation.

So it's still the classical framework that Lorenzo Lascaux introduced.

So you are being given here in this talk, a strictly positive kernel function, which

is continuous.

And I assume here it's strictly positive definite.

So for instance, the Gaussian kernel is such a kernel, but you can also, for instance,

remove the square in the argument of the exponential.

And this gives you a Laplace-like kernel, which is also satisfying this assumption.

Zugänglich über

Offener Zugang

Dauer

00:42:47 Min

Aufnahmedatum

2020-05-04

Hochgeladen am

2020-05-05 11:36:30

Sprache

en-US

A well-known technique for large scale kernel methods is the Nystrom approximation. Based on a subset of landmarks, it gives a low rank approximation of the kernel matrix, and is known to provide a form of implicit regularization. We will discuss the impact of sampling diverse landmarks for constructing the Nystrom approximation in supervised and unsupervised problems. In particular, three methods will be considered: uniform sampling, leverage score sampling and Determinantal Point Processes (DPP). The implicit regularization due the diversity of the landmarks will be made explicit by numerical simulations and analysed further in the case of DPP sampling by some theoretical results.

Einbetten
Wordpress FAU Plugin
iFrame
Teilen