13 - AI for maths and maths for AI [ID:58154]

50 von 860 angezeigt

Good afternoon to all of you. Thank you for joining. It's for me a great pleasure to present

the speaker of the next Maud lecture today, Francois Charton. Francois is working in Meta

in Paris and he has developed a very bright and non-standard career in science. He will tell you

maybe more about his own parkour. He was a student, let me start from the very beginning,

he was a student in a corpoli technique in France that we all know and he's now working in Meta

since 2017. 19, yeah, so he was hired for the Codid age, right? Yeah, and actually I knew about

his name because of the joint work he has written in particular with Amoria Yatt that was delivered

in a lecture in this series a couple of years ago and it was precisely in the presentation of

Amoria Yatt that he was lecturing on in particular how to use artificial intelligence and machine

learning ideas to tackle some of the classical problems in dynamical systems and in controlling

dynamical systems as for instance computing Lyapunov functions and discovering newly

Lyapunov functions and well I thought after Amoria's lecture that we should have a second one by

Francois with his own point of view much closer to I think to say technological transfer and

innovation. So thank you again Francois for your visit and the stage is yours. Okay, hi, so yes,

I'm Francois Charton, I work in Meta and also in Ecole des Ponts in Cermic-Suisamoria-Yatt and in

Meta I work in FAIR which is a research laboratory that depends of Meta. So we're one of the few

people inside Meta that are sort of allowed to do some kind of blue sky research etc. I'm on the

blue sky part of the lab which is not always easy but yeah and so my subject is AI for mathematics.

So how do you use AI to do mathematics and I will talk about two things using AI to do mathematical

discovery. I'm not interested in doing all math, I want to do new math, I want to do discovery with

AI and using math for AI you know what math can help you learn about how AI model learn and

maybe improve AI models.

Okay so very quickly a few words about architecture. So I'm working on a model which is you know the

standard model in AI these days which are the transformer, this is what powers chat GPT and all

the large language model that you know. It's originally and this is important an architecture

that was meant for translation from automatic translation from one language into the next and

it was introduced in 2017 with this in mind with translation in mind and a transformer at least

the kind of transformer I use are made of two parts. You've got an encoder that process one

language and you've got a decoder that process the other language so the source language and the

target language. So the role of the encoder is to transform your sentence here in French into first

a set of vectors in high dimensional space so mathematical stuff then it will be processed by

many layers of an encoder into a mathematical representation that is a sequence of numbers of

in this case in high dimension for vectors in high dimension and the decoder will do the English or

German or Chinese part and the way does it is that it translates it spits the word one at a time so

basically here we are in the situation where the two first word under the have been processed and

you're going to feed the transformer the decoder with what you have decoded so far and also the

representation produced by the encoder so the internal representation of the inputs and it's

going to spit the next token which is Mirabeau which you might notice is interesting because in

French the third word is bridge but the transformer knows that in English you have to change the thing

so that's the way basically it works and these encoder and decoder are made of many layers tens

of layers in charge GPT and stuff like that and a transformer layer is it's a little complicated

it's more complicated than your typical layer from a neural net that you know so the weight

functions you know you have input vectors the I1 to I4 that are connected to the output vector

but what is called the residual connection so that's that just copied and then you add stuff to it and

you had two kind of stuffs you had attention which is something that takes care of the correlation

between the different words in the sentence so between the different vector so you could think

of it as D correlating the different element just as you would process a time series and eliminate

you know seasonality monthly seasonality or something like that and then the second part the

FFN feed forward network is a regular free layer neural net as everybody has been using since the

D 80s or something like that so the transformer layer is a fairly complicated thing and you stack

Teil einer Videoserie :

FAU MoD Lectures Series 2024/25

Presenters

Dr. François Charton

Zugänglich über

Offener Zugang

Dauer

01:27:13 Min

Aufnahmedatum

2025-06-23

Hochgeladen am

2025-06-24 10:29:08

Sprache

en-US

Tags

Per RSS abonnieren