3 - On solving/learning differential equations with kernels (H. Owhadi, Caltech) [ID:36398]
50 von 331 angezeigt

Yes, so I've designed it to be slow paced so you can essentially interrupt me at any

times. So this talk is going to be an invitation to explore the role of kernel methods in learning

and solving differential equations. Now before I get into the main topic of this talk, I will

give a short reminder on scalar-valued kernels to make sure that everyone is on the same page.

I will do so in the setting of the following interpolation problem in which you try to

approximate an unknown target function f dagger mapping some input space x to the real line given

that f dagger of x is equal to y. Here I write x and y for the n-dimensional vectors whose entries

are the input output data points and I write f dagger of x for the n-dimensional vector

whose entries are the images of the input data points and the f dagger.

Now there are four equivalent ways to describe kernel-based solutions to this problem

and these are based on the following four equivalent ways of defining such kernels.

The first one is to define a scalar-valued kernel as a function k mapping the input space x times x

onto the real line such that for all integer m and all points x1, xm in the input space

the m by m matrix with the following entries is symmetric and positive.

Now equivalently k is a scalar-valued kernel if and only if there exists a Hilbert space f

known as a feature space and a function psi mapping x to f known as a feature map such that k x x prime

is equal to the inner product in the space f between psi of x and psi of x prime.

Now this is equivalent to the existence of a Hilbert space of functions f mapping the input space x

to the real line to the real line such that f of x is equal to the inner product in the space h between

f and the function defined by the kernel k supported at the point x. The norm associated

with the inner product in h is known as the RKHS norm associated with the kernel k

and we will also write it as this k norm for ease of representation. Now this is equivalent to the

existence of a centered Gaussian process x such that k is the covariance function of that Gaussian

process. Recall that x is a function mapping x to a Gaussian space which is a Hilbert space of

scalar valued Gaussian random variable and that the covariance function of psi is the function

obtained by computing the covariance between psi at x and psi at x prime for all x and x prime.

Now these four equivalent definitions lead to four equivalent solutions to our non-interpolation

problem. The first one is to approximate the target function f dagger with the following

function f where k is our given kernel k of capital X capital X is the n by n matrix with

the following entries and k of little x capital X is the n dimensional vector with the following entries.

The second one is to approximate the target function with the following inner product involving

the feature map psi associated with the kernel k and some coefficient c living in a feature space

f identified by minimizing its norm subject to the interpolation constraints. Therefore you can

think of kernel interpolation also known as kriging as a linear interpolation in feature space

and the main idea is to first map the data to a Hilbert space via possibly non-linear

feature map and then to linearly approximate that target function in that feature space.

The third one is to approximate the target function with the minimizer of the function

with the minimizer of the following problem in which the norm to be minimized is the RKHS

norm defined by the kernel k. Recall that since these methods are equivalents this norm is also

defining the kernel k. This approach is also known as optimal recovery in numerical approximation

and it can be traced back to the work of Micheli and Rivlin.

The fourth one is to approximate the target function by the expectation of the Gaussian process

with covariance function k conditioned on the interpolation constraints.

Okay so why are kernel methods relevant to numerical approximation? Well as observed by

Sord, Larkin, Dichonis and many others who have investigated intriguing similarities between Gaussian

process regression and numerical approximation most numerical approximation methods are actually

kernel interpolation methods. For example the cardinal splines of Schoenberg are optimal recovery

splines that is kernel interpolants obtained from zero one measurements and the

defined by polynomial kernels. This is also true for the polyharmonic splines of Hardy, DeMari

and Duchon which can be identified as optimal recovery splines. To describe this consider the

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:40:50 Min

Aufnahmedatum

2021-10-08

Hochgeladen am

2021-10-14 13:16:04

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen