Good afternoon to all of you. Thank you for joining. It's for me a great pleasure to present
the speaker of the next Maud lecture today, Francois Charton. Francois is working in Meta
in Paris and he has developed a very bright and non-standard career in science. He will tell you
maybe more about his own parkour. He was a student, let me start from the very beginning,
he was a student in a corpoli technique in France that we all know and he's now working in Meta
since 2017. 19, yeah, so he was hired for the Codid age, right? Yeah, and actually I knew about
his name because of the joint work he has written in particular with Amoria Yatt that was delivered
in a lecture in this series a couple of years ago and it was precisely in the presentation of
Amoria Yatt that he was lecturing on in particular how to use artificial intelligence and machine
learning ideas to tackle some of the classical problems in dynamical systems and in controlling
dynamical systems as for instance computing Lyapunov functions and discovering newly
Lyapunov functions and well I thought after Amoria's lecture that we should have a second one by
Francois with his own point of view much closer to I think to say technological transfer and
innovation. So thank you again Francois for your visit and the stage is yours. Okay, hi, so yes,
I'm Francois Charton, I work in Meta and also in Ecole des Ponts in Cermic-Suisamoria-Yatt and in
Meta I work in FAIR which is a research laboratory that depends of Meta. So we're one of the few
people inside Meta that are sort of allowed to do some kind of blue sky research etc. I'm on the
blue sky part of the lab which is not always easy but yeah and so my subject is AI for mathematics.
So how do you use AI to do mathematics and I will talk about two things using AI to do mathematical
discovery. I'm not interested in doing all math, I want to do new math, I want to do discovery with
AI and using math for AI you know what math can help you learn about how AI model learn and
maybe improve AI models.
Okay so very quickly a few words about architecture. So I'm working on a model which is you know the
standard model in AI these days which are the transformer, this is what powers chat GPT and all
the large language model that you know. It's originally and this is important an architecture
that was meant for translation from automatic translation from one language into the next and
it was introduced in 2017 with this in mind with translation in mind and a transformer at least
the kind of transformer I use are made of two parts. You've got an encoder that process one
language and you've got a decoder that process the other language so the source language and the
target language. So the role of the encoder is to transform your sentence here in French into first
a set of vectors in high dimensional space so mathematical stuff then it will be processed by
many layers of an encoder into a mathematical representation that is a sequence of numbers of
in this case in high dimension for vectors in high dimension and the decoder will do the English or
German or Chinese part and the way does it is that it translates it spits the word one at a time so
basically here we are in the situation where the two first word under the have been processed and
you're going to feed the transformer the decoder with what you have decoded so far and also the
representation produced by the encoder so the internal representation of the inputs and it's
going to spit the next token which is Mirabeau which you might notice is interesting because in
French the third word is bridge but the transformer knows that in English you have to change the thing
so that's the way basically it works and these encoder and decoder are made of many layers tens
of layers in charge GPT and stuff like that and a transformer layer is it's a little complicated
it's more complicated than your typical layer from a neural net that you know so the weight
functions you know you have input vectors the I1 to I4 that are connected to the output vector
but what is called the residual connection so that's that just copied and then you add stuff to it and
you had two kind of stuffs you had attention which is something that takes care of the correlation
between the different words in the sentence so between the different vector so you could think
of it as D correlating the different element just as you would process a time series and eliminate
you know seasonality monthly seasonality or something like that and then the second part the
FFN feed forward network is a regular free layer neural net as everybody has been using since the
D 80s or something like that so the transformer layer is a fairly complicated thing and you stack
Presenters
Dr. François Charton
Zugänglich über
Offener Zugang
Dauer
01:27:13 Min
Aufnahmedatum
2025-06-23
Hochgeladen am
2025-06-24 10:29:08
Sprache
en-US