So I think I can start now. So today is our actually second lecture that is closely related
to the theoretical understanding of the condensation phenomena. As you can see among all these five
lectures there are three that is three kind of lectures are about condensation. So because
in one way condensation is really kind of a highly nonlinear phenomena and on the other hand so it's
highly complicated it's it's it's I mean first of all it's a phenomenon that you can observe
during the training dynamics right so it's a kind of dynamical phenomena. On the other hand if you
ask why we observe this dynamic whole problem for the gradient flow over the lost landscape
right so there must be some structures in the lost landscape that lead you to these condensation
phenomenon. And then on the third part if condensation really happens that must give
you certain benefit on certain class of targets right and that's the topic I will tell I will
talk more about tomorrow. So today I'm focusing on the lost landscape that helps the condensation.
Yeah. Oh the condensation is what I've told so yesterday so it's a phenomena where different
neurons in a same layer has a tendency to align with one another. So that's the phenomena of
condensation. So that means when neurons in the same layer condense with one another we can cluster
the neurons into different groups right so there the effective number of neurons in that layer is
actually less than what is actually there. Okay so as I've told you in the first lecture that if we
are lucky enough to kind of see a very informative piece of that that new object and then we will
have a strong feeling that there should be neighboring pieces we could uncover. For example
you see a very clear phenomena of the condensation right and because you don't understand it therefore
you're trying to uncover the different neighboring pieces in order to help you have a better picture
of why there's a condensation right why we observe this piece here right it is because it belongs to
a kind of a bigger picture okay but there's the condensation is a kind of central piece that really
help help you uncover more and more pieces about this object. However if you look at other
phenomenons these many other phenomena are also very prominent phenomenons among their time
particularly like double descent it's very influential in the statistical community. However
these phenomena are limited not because these phenomena cannot be observed in certain situation
or it cannot be theorized but because this phenomena cannot help you uncover more right about
there's a real object there but condensation is different so and later I hope through all these
three lectures about condensation you could have a better feeling that the reason of condensation
is because really there's a something that is there and we can have a better picture through
all these works about condensation okay so and then now since we are mainly caring about the lost
landscape so what is a lost landscape it's very simple so yeah yeah no no whatever the training
data or the target the neural network itself if you initialize with a kind of small variance you
always observe condensation and the reason is about the lost landscape and you can see in this
lost landscape the lost part this this little L is trivial because usually for example we use some
convex loss for example L2 right therefore if you consider the distance between F and Y is convex
right and nothing but where does this in which sense we say this lost landscape is non-convex
it is because this F theta is nonlinear we put any linear model there it just remains convex right
and there's nothing surprise however if we put this F or we use neural network this kind of model
as as a model we also parameterization of these functions and then we arrive at a kind of could
be nasty and or we say at least non convex loss landscape therefore so lost landscape is just
about the function it's a high dimensional function of the loss or we can say empirical
risk regarding the parameter theta okay so why we call it lost landscape instead of just a function
right is because we kind of have some this kind of picture in our mind so okay this optimization
already trying to get some minimas there among these kind of a hues and all these kind of obstacles
right we're trying to get to the kind of place with this minimum loss and this picture is really
important and people whenever we have something even though this high dimensional things we still
have this similar picture in our mind and people also try to plot these kind of things 2d things
although you know that that is never true right it's a high dimensional function you are never
able to use a 2d kind of visualization to help you fully understand it but people still try to
Presenters
Zugänglich über
Offener Zugang
Dauer
01:30:05 Min
Aufnahmedatum
2025-05-07
Hochgeladen am
2025-05-07 20:49:39
Sprache
en-US
FAU MoD Course: Towards a Mathematical Foundation of Deep Learning: From Phenomena to Theory
Session 4: From Condensation to Loss Landscape Analysis
Speaker: Prof. Dr. Yaoyu Zhang
Affiliation: Institute of Natural Sciences & School of Mathematical Sciences, Shanghai Jiao Tong University
1. Mysteries of Deep Learning
2. Frequency Principle/Spectral Bias
3. Condensation Phenomenon
4. From Condensation to Loss Landscape Analysis
5. From Condensation to Generalization Theory