5 - FAU MoD Course (5/5): From Condensation to Generalization Theory [ID:57703]
50 von 878 angezeigt

I think I can start now. So today is our final lecture and finally we get to a topic which

is more relevant to the key mystery of the deep neural network that is about its generalization.

As I told you about, so the generalization puzzle is kind of the most prominent problem

that's there. It is proposed by Leo Bremer at least as early as 1995 and till now it's like 30

years and we haven't been able to solve it. And now, so we did not really start by looking at

this kind of generalization problem directly. However, after all these phenomenas, all these

kind of preparations, it's time to get into this problem and also we can now forget about all these

existing framework that we could use to talk about generalization theory. But just think about what

kind of generalization description we could have based on all these solid phenomenon or solid

understandings we have for this system. Okay, yeah, it is our last lecture and still condensation

is a key phenomena that inspires all these understanding, new understandings of the

generalization. Okay, and first, so things we observe this condensation phenomena, we know that

there are conditions for condensation, right? And then there's a natural question is that,

since I know by tuning the scale of initialization, I can obtain a kind of, I can let the network in a

region with stronger condensation or less condensation. And then we ask the question,

what is the real advantage of, generalization advantage of condensation? In what kind of

situation condensation really brought about benefits on generalization? But when we think

about this problem, we always need to keep this no free lunch theorem in our mind. That tells you,

okay, you shouldn't expect a theoretical argument that tells you, okay, condensation is always good.

There's no way we could arrive at any conclusion like that. Therefore, all we are trying to say is

that for what kind of problem or what type of problem condensation could really help. So that's

the question we should really ask. Okay, and then there are basic observations we could make. For

example, we know that the condensation can help us avoid this type of like overfitting, meaning that

using this kind of highly oscillatory pattern to fit this training data, because that would require

too many groups of neurons, right? And far more than the minimum number of groups you could ever

use to interpolate this function. Okay, so you could easily observe these kind of phenomenon.

Okay, and what's more we can say about the condensation? Yeah, so we can see if we just

think about, okay, this type of relatively smooth interpolation of all this training data,

and also frequency principle is enough to help us understand this kind of a non-overfitting behavior.

So in which sense condensation bring us, give us kind of more benefits regarding the generalization,

and that's a question we really want to understand. But before we go into the details,

I also want to tell you in which sense these kind of generalization issues are, what kind of problems

we really have about the generalization that is relevant to the large language models. And we can

see there's a very important paper which is called Scaling Laws for the Neural Language Models by

OpenAI in 2020. How many people have ever read this paper? Anyone? Then I could say you are

lagging behind. So this paper is the most important paper in AI that really tell us, okay,

there are three factors that are key to the performance of the large language model,

which are the model size and data size, and also the computation you use for the training. And why

this paper is so important? Because we know that it is these neural networks are largely a black

box, right? And there's no theory that really predicts what are the most important factors for

its performance, right? We don't have these kind of theoretical arguments or actually a solid

understanding. Therefore, the only thing we can rely on, it's kind of very careful experiments,

right? And it is in this work, the OpenAI, it compared so many different kind of factors,

it tuned lots of different things in order to find out what are the factors that are the most

important for the performance of the large language models. And then we find out, okay,

there are three factors. So what's the relation between these three factors to the performance,

right? And it has this, for example, particularly regarding this training data size, it says,

if I increase the training data size, using this as x-axis, and then here's a testing loss,

and then there's, we can fit this power law relation. So these are in the log log scale,

and there's a kind of looking like a power law relation between the test loss and the data size.

Zugänglich über

Offener Zugang

Dauer

01:52:28 Min

Aufnahmedatum

2025-05-08

Hochgeladen am

2025-05-09 03:29:05

Sprache

en-US

Date: Fri. – Thu. May 2 – 8, 2025
FAU MoD Course: Towards a Mathematical Foundation of Deep Learning: From Phenomena to Theory
Session 4: From Condensation to Loss Landscape Analysis
Speaker: Prof. Dr. Yaoyu Zhang
Affiliation: Institute of Natural Sciences & School of Mathematical Sciences, Shanghai Jiao Tong University
Organizer: FAU MoD, Research Center for Mathematics of Data at FAU, Friedrich-Alexander-Universität Erlangen-Nürnberg
Overall, this course serves as a gateway to the vibrant field of deep learning theory, inspiring participants to contribute fresh perspectives to its advancement and application.
Session Titles:
1. Mysteries of Deep Learning
2. Frequency Principle/Spectral Bias
3. Condensation Phenomenon
4. From Condensation to Loss Landscape Analysis
5. From Condensation to Generalization Theory
 

Tags

FAU FAU MoD FAU MoD Course FAUMoD faumod course course
Einbetten
Wordpress FAU Plugin
iFrame
Teilen