3 - FAU MoD Course (3/5): Condensation Phenomenon [ID:57701]

50 von 880 angezeigt

Yeah, first, there's some of these problems that are just taught in a hurry.

But today, let's go back to these kind of dynamics.

I think first of all, it's a very kind of easy dynamic because you know that it's a

kind of equation, linear ODE. It's like linear ODE, but actually you can see that it's in the

frequency domain. So let's look at this P here. So this P refers to an empirical distribution,

which in general, it's a kind of a summation of deltas because we only sample at discrete

points. And let's think about the case where there's no this P here. There's no this

empirical distribution. Therefore, this H is just a Fourier transform of the function we care about.

It's a function of the neural network. And therefore, what we can see, suppose this F,

it's in there's a no P here. So this F is a target function. And then this H is the function,

neural network function. And then it tells you, okay, the evolution of the neural network function

in the Fourier space, it's kind of actually, or the dynamics of this neural network function

is decoupled in the Fourier space. You can see each of these Fourier modes, for example,

given certain case, you can see their evolution are kind of independent. That they have no

coupling between different frequencies. Then what happens is that each of these terms, for example,

H, initially probably it's a kind of zero function. Therefore, all these frequencies are zero. But

later, since there's this term, arrow term that drives this, the Fourier transform of the H to

the Fourier transform of this target function. And later, all these components in the Fourier

domain will converge to the corresponding kind of Fourier components in the target.

So and all independently. It's just, yeah, please.

H and HP is different because you see that's a, here's a notation. So if I put this small P here,

that means some function times this P. This P is generally, for example, here it's a, it tells

you only the information at a certain empirical samples are observed. Therefore, there's some

delta here that marks that fact. So without this P, so you can see it's a very simple dynamics,

which means each component in the Fourier domain converges to that component of the

target. And the decaying factor is determined by this pre-factor. And it's decaying in a power

law manner in the Fourier domain. That means, okay, the lower frequency part, all of this

component will converge to that Fourier component of the target. However, it's just that lower

frequencies converges faster, right? The higher frequency converges more slowly. However, what's

really interesting is not about that. It's about when we only have the observation at finite samples,

right? That means what we have, what we can observe about the target, it's only at these training

samples. Nowhere else we know the information of the target. And also in this arrow term,

we only care about whether the neural network output function evaluated at these training

samples. We only care whether they match that of the target. Therefore, so we ask both HP and

FP kind of only care about the information of these functions at a finite point. And then,

so with this kind of driving terms, what we can see is that there could be some aliasing effect.

For example, if like this f, the target is kind of a high frequency oscillation. However,

we only observe these target function at these kind of sparse kind of points. Therefore,

you can see aliasing could happen in a sense that H not necessarily equal to P, right? For example,

if H here is this kind of yellow line, still you can see HP Fourier transform is equal to the FP

Fourier transform because at these training samples, these two function, this yellow line

and this target function aligns perfectly. Therefore, these arrow terms will become zero.

So that means, the aliasing means, okay, if your data is not kind of dense enough,

then lower frequency Fourier mode can align exactly with a higher frequency Fourier mode.

Therefore, there's no way to distinguish so whether it's a kind of higher frequency mode,

and both of them is able to explain these observations on these sparse grids. And then,

there we ask, okay, since we can use lower frequency modes to explain things and we can

also use higher frequency modes or much higher frequency modes to explain these training data,

what kind of solution will neural network learn? But now, you see, this dynamics tells you, okay,

we prioritize over the lower frequency part. That means, although the higher frequency modes

Teil einer Videoserie :

FAU MoD Course: Towards a Mathematical Foundation of Deep Learning: From Phenomena to Theory

Presenters

Prof. Dr. Yaoyu Zhang

Zugänglich über

Offener Zugang

Dauer

01:55:01 Min

Aufnahmedatum

2025-05-06

Hochgeladen am

2025-05-07 16:09:53

Sprache

en-US

Date: Fri. – Thu. May 2 – 8, 2025
FAU MoD Course: Towards a Mathematical Foundation of Deep Learning: From Phenomena to Theory
Session 3: Condensation Phenomenon
Speaker: Prof. Dr. Yaoyu Zhang
Affiliation: Institute of Natural Sciences & School of Mathematical Sciences, Shanghai Jiao Tong University

Organizer: FAU MoD, Research Center for Mathematics of Data at FAU, Friedrich-Alexander-Universität Erlangen-Nürnberg

SEE MORE: https://go.fau.de/1bd–

Overall, this course serves as a gateway to the vibrant field of deep learning theory, inspiring participants to contribute fresh perspectives to its advancement and application.

Session Titles:
1. Mysteries of Deep Learning
2. Frequency Principle/Spectral Bias
3. Condensation Phenomenon
4. From Condensation to Loss Landscape Analysis
5. From Condensation to Generalization Theory

SEE MORE: https://go.fau.de/1bd–

www.mod.fau.eu

Tags

Per RSS abonnieren