Okay, good afternoon, thank you for coming.
Today we are particularly happy we have two lectures, the first one by Professor Holger
Rauhbun, the next one we have a coffee break, right, in the department, and then there will
be the second lecture which is a professional collective lecture of the department by Professor
Rauhbun. I think you all know Holger Rauhbun, he is a professor at LNU. Before that, until
basically one year ago or so, he was a professor in Athens. Before that he even got a Ph.D.
in Bavaria in the Technical University of Munich, 20 years back. He is very well known
in different aspects. Probably you all know his book in data compression or sparse, how
you say, compressive sensing. More recently he has been very much engaged on the analysis
of machine learning and artificial intelligence algorithms, and this is the main topic of
his chair nowadays, and also the topic of the lecture which is hopefully to explain,
disclose what are the main reasons for this celebrated implicit bias phenomenon in machine
learning. Thank you for joining, thank you.
Thanks very much for the kind invitation and introduction. This is about the implicit bias
phenomenon in deep learning, but let's first go one step back. I guess we have all seen
the tremendous success of deep learning or artificial intelligence in general, and deep
learning is basically the key methodology that makes these breakthroughs work, like
an image recognition, sound recognition analysis of social networks, chat GPT, all these kind
of things. There is a lot of, I mean the basic construction of these systems use a lot of
mathematics, and this seems to work, but so far, I mean we haven't yet gotten to a situation
where we can really say we fully understand what's going on and why actually this is working,
so I'm trying to explain some of the phenomenon, but we are still at the very beginning here.
Yeah, so as a mathematician, one would ask oneself, like okay, this is sort of constructed using
mathematics, so we should also be able to understand something using mathematics, and in particular,
can we actually prove something about deep learning? Now, there are several mathematical aspects
here. One is optimization, so machine learning usually uses some kind of training data in order to adapt
a certain system to the real world, and so we do not do this by thinking about how the world works,
and then figure out principles like physics, but rather do this by example, so we give the system a lot of
examples, and the system has to figure out what are patterns, and the key methodology to fit these models
to data is by setting up an optimization problem and then solving it, but in particular, in deep learning,
this all leads to non-convex optimization problems, which are notoriously hard to understand, and so the question is
can we provide some understanding? Now, the next item here is generalization properties, so the crucial question is
then if we fit enough, hopefully fit enough data into the system, how does this perform on new data?
So we want to predict something, but we only have seen certain amounts of data and not seen every possible configuration
or possible situation, so the question is how do we perform on unseen data? Then there are questions of approximation theory,
stability properties, or designing certain networks for a specific task, but I will focus on these first two aspects,
optimization and generalization. One particular phenomenon I talk about is this implicit bias phenomenon,
and this will highlight also the role of sparsity or more generally networks of low complexity.
Okay, so what is a neural network? I guess most of you should have seen something like that.
So a network is built up of layers, and these layers are these simple affine functions composed with a non-linearity which acts component wise.
And so what we have to do, we have to adapt these matrices and these offsets,
such that the whole network, which is a composition of these layers, works well for certain tasks.
Okay, so this is just one simple example there. All kinds of networks, you can impose structure on these matrices and so on,
but I will not go much into different architectures, but you can imagine that there are lots of possibilities to design these networks,
and that's the art and practice to make, to find architectures which work well for your problem and path.
Okay, so the training works like you are given input and output pairs. So the X1, X2, and so on are inputs, and the Y1, Ym, and so on are labels.
And so what we want to do, we want to find a network such that for a given input, it basically reproduces the output, and hopefully does this also then on new data.
So what one usually does, one sets up so-called loss functional, and one starts with a loss function, which simply measures how far the output of a network on a given input is from this given label,
and then adds this up over all data. And so intuitively, if we minimize this over the parameters, we should get the neural network, which at least on these data, produces more or less the given output.
Okay, and so in practice, one uses rather simple algorithms, I mean first order methods. So we simply, well, we start at a certain point, like this is the initialization, and then we compute the gradient of the loss with respect to all of these matrices.
Presenters
Prof. Dr. Holger Rauhut
Prof. Dr. Christian Bär
Zugänglich über
Offener Zugang
Dauer
01:50:23 Min
Aufnahmedatum
2024-12-03
Hochgeladen am
2024-12-19 23:46:04
Sprache
en-US
Event: FAU MoD Lecture
Event type: On-site / Online
Organized by: FAU MoD, the Research Center for Mathematics of Data at Friedrich-Alexander-Universität Erlangen-Nürnberg (Germany)
Speaker: Prof. Dr. Holger Rauhut
Affiliation: Mathematisches Institut der Universität München (Germany)
Speaker: Prof. Dr. Christian Bär
Affiliation: Institut für Mathematik. Universität Potsdam (Germany)