11 - Mathematical aspects of neural network approximation and learning [ID:31570]

50 von 1011 angezeigt

Okay, thank you, Daniel, Leon, and yeah, thanks for facilitating this family reunion.

So yeah, so I just also keep, let's try to keep this familiar spirit.

And so please interrupt me at any time you have any question.

And if you want to, you know, turn on your video even happier.

But okay, so, so today I'm going to be presenting some some of the work that I've been doing

in my group.

So together it's mostly the result of the work with some of my PhD students, Zhengdao,

Chen, Aaron Feig, Sami, Jelasi and Luca Venturi, and also collaborators, Eric, he's my colleague

at Courant, and Grant, who is a just started assistant professor in Stanford.

So the starting point of the talk is what you would expect from a deep learning seminar

series is basically just re-emphasizing how much deep learning has changed the empirical

data science in the last 10 years.

So now there's many, many domains, in particular starting in computer vision, where these models

have completely transformed the domain.

It's actually a real revolution.

It just started like 10 years ago.

And these started in computer vision, but now it's, one could almost say that it's

almost everywhere in computational science.

Now we have advances in computational biology, protein folding, in quantum chemistry, in

even like scientific computing, in the geometric simulation, robotics, you name it.

So at this point, if one wants to start trying to formalize this question mathematically

and try to pose interesting questions, the starting point is really to observe that maybe

all these situations have something in common.

Is that we have a high dimensional input space, right?

So the observations are really high dimension, living in a high dimensional space with maybe

complicated structures that need to be extracted.

And rather than relying on some predefined knowledge, physical knowledge of the problem,

these are deep learning architectures.

They introduce like a paradigm shift where now this representation phi is going to be

learned.

It's going to be completely optimized through the data.

And the technique in which this thing is being learned is for all purposes, the most naive

optimization algorithm you could imagine, which is just based on local updates, right?

You start from like arbitrary representation and then you keep improving it locally, trying

to minimize some objective function.

And so the tone of the talk is really to try to formalize how, when and why can these learning

systems approximate high dimensional functions from data.

So it's really like, and really the most important word here is this high dimensionality, right?

So that's the kind of the motto of the talk today.

And so I'm going to be focusing on the simplest and maybe most boring aspect of learning,

which is the supervised learning, where it's formalized using again, classical statistical

learning language, where it starts from some high dimensional data distribution.

And then I assume that these are, so these are the X's.

These are the input to the model.

And then the model is asked to predict some unknown function.

Let's call this F star, right?

So F star could be the label that you want to predict or the location of some object

or maybe the energy of a molecule, something like that.

And so this function F star is approximated or hopefully learned using a model, an approximation

model.

Teil einer Videoserie :

AG Mathematics of Deep Learning

Zugänglich über

Offener Zugang

Dauer

01:02:09 Min

Aufnahmedatum

2021-04-20

Hochgeladen am

2021-04-23 15:07:36

Sprache

en-US

Joan Bruna on "Mathematical aspects of neural network approximation and learning"

High-dimensional learning remains an outstanding phenomena where experimental evidence outpaces our current mathematical understanding. Neural Networks provide a rich yet intricate class of functions with statistical abilities to break the curse of dimensionality, and where physical priors can be tightly integrated into the architecture to improve sample efficiency. Despite these advantages, an outstanding theoretical challenge in these models is computational, by providing an analysis that explains successful optimization and generalization in the face of existing worst-case computational hardness results.

In this talk, we will describe snippets of such challenge, covering respectively optimization and approximation. First, we will focus on the framework that lifts parameter optimization to an appropriate measure space. We will overview existing results that guarantee global convergence of the resulting Wasserstein gradient flows, and present our recent results that study typical fluctuations of the dynamics around their mean field evolution, as well as extensions of this framework beyond vanilla supervised learning to account for symmetries in the function. Next, we will discuss the role of depth in terms of approximation, and present novel results establishing so-called ‘depth separation’ for a broad class of functions. We will conclude by discussing consequences in terms of optimization, highlighting current and future mathematical challenges.

Joint work with: Zhengdao Chen, Grant Rotskoff, Eric Vanden-Eijnden, Luca Venturi, Samy Jelassi and Aaron Zweig.

Tags

Per RSS abonnieren