Okay, thank you, Daniel, Leon, and yeah, thanks for facilitating this family reunion.
So yeah, so I just also keep, let's try to keep this familiar spirit.
And so please interrupt me at any time you have any question.
And if you want to, you know, turn on your video even happier.
But okay, so, so today I'm going to be presenting some some of the work that I've been doing
in my group.
So together it's mostly the result of the work with some of my PhD students, Zhengdao,
Chen, Aaron Feig, Sami, Jelasi and Luca Venturi, and also collaborators, Eric, he's my colleague
at Courant, and Grant, who is a just started assistant professor in Stanford.
So the starting point of the talk is what you would expect from a deep learning seminar
series is basically just re-emphasizing how much deep learning has changed the empirical
data science in the last 10 years.
So now there's many, many domains, in particular starting in computer vision, where these models
have completely transformed the domain.
It's actually a real revolution.
It just started like 10 years ago.
And these started in computer vision, but now it's, one could almost say that it's
almost everywhere in computational science.
Now we have advances in computational biology, protein folding, in quantum chemistry, in
even like scientific computing, in the geometric simulation, robotics, you name it.
So at this point, if one wants to start trying to formalize this question mathematically
and try to pose interesting questions, the starting point is really to observe that maybe
all these situations have something in common.
Is that we have a high dimensional input space, right?
So the observations are really high dimension, living in a high dimensional space with maybe
complicated structures that need to be extracted.
And rather than relying on some predefined knowledge, physical knowledge of the problem,
these are deep learning architectures.
They introduce like a paradigm shift where now this representation phi is going to be
learned.
It's going to be completely optimized through the data.
And the technique in which this thing is being learned is for all purposes, the most naive
optimization algorithm you could imagine, which is just based on local updates, right?
You start from like arbitrary representation and then you keep improving it locally, trying
to minimize some objective function.
And so the tone of the talk is really to try to formalize how, when and why can these learning
systems approximate high dimensional functions from data.
So it's really like, and really the most important word here is this high dimensionality, right?
So that's the kind of the motto of the talk today.
And so I'm going to be focusing on the simplest and maybe most boring aspect of learning,
which is the supervised learning, where it's formalized using again, classical statistical
learning language, where it starts from some high dimensional data distribution.
And then I assume that these are, so these are the X's.
These are the input to the model.
And then the model is asked to predict some unknown function.
Let's call this F star, right?
So F star could be the label that you want to predict or the location of some object
or maybe the energy of a molecule, something like that.
And so this function F star is approximated or hopefully learned using a model, an approximation
model.
Zugänglich über
Offener Zugang
Dauer
01:02:09 Min
Aufnahmedatum
2021-04-20
Hochgeladen am
2021-04-23 15:07:36
Sprache
en-US
Joan Bruna on "Mathematical aspects of neural network approximation and learning"
High-dimensional learning remains an outstanding phenomena where experimental evidence outpaces our current mathematical understanding. Neural Networks provide a rich yet intricate class of functions with statistical abilities to break the curse of dimensionality, and where physical priors can be tightly integrated into the architecture to improve sample efficiency. Despite these advantages, an outstanding theoretical challenge in these models is computational, by providing an analysis that explains successful optimization and generalization in the face of existing worst-case computational hardness results.
In this talk, we will describe snippets of such challenge, covering respectively optimization and approximation. First, we will focus on the framework that lifts parameter optimization to an appropriate measure space. We will overview existing results that guarantee global convergence of the resulting Wasserstein gradient flows, and present our recent results that study typical fluctuations of the dynamics around their mean field evolution, as well as extensions of this framework beyond vanilla supervised learning to account for symmetries in the function. Next, we will discuss the role of depth in terms of approximation, and present novel results establishing so-called ‘depth separation’ for a broad class of functions. We will conclude by discussing consequences in terms of optimization, highlighting current and future mathematical challenges.
Joint work with: Zhengdao Chen, Grant Rotskoff, Eric Vanden-Eijnden, Luca Venturi, Samy Jelassi and Aaron Zweig.