So I thought I would present to you something I didn't have time to present yesterday.
This is not finished, there's no paper, this is work in progress, this is what keeps me
up at night.
Don't scoop me, I haven't published this anywhere.
So you see yesterday at the end of the talk we left after having been able to calculate
the GCD with a fairly good accuracy.
So we showed that Transformer can learn relatively complex arithmetic functions, more complex
than multiplication or something like that.
You need to work for it by maybe adjusting the training distribution and all that.
And you can usually understand what the model is doing just by looking at the model prediction.
So it's an interesting approach to interpretability because usually when people want to do interpretability
they look at the weights.
The problem is that even in a very small model you have hundreds of thousands if not millions
of weights.
So how do you make sense of a million of stuff, most of them random?
It's extremely difficult.
Looking at model prediction provides an avenue for explanation of a simpler nature.
But this was a fairly simple case, this was a classification problem mostly because you
just have a small number of possible outcomes.
GCD 1 to 10, you know, account for 99% of all examples or all the GCDs you can see,
you will see in your training set.
So the task of the model there is to take the inputs, classify them into a small number
of classes corresponding to single outcomes and then assigning an outcome to every class.
What if we worked on a much harder problem?
So I want to talk here about the infamous collapse problem which I will show is a regression
problem.
So it's typical in machine learning to distinguish between pattern recognition or classification
where you have just a few, you want to distinguish two kinds of or three or any kinds of elements
and regression when you want to estimate a function and admittedly regression is a much
more difficult thing.
So collapse, you probably know of the collapse sequence, you know, it's this funny mathematical
object, you take an integer, a positive integer, if it's odd you multiply it by 3 and add 1
and if it's even you divide by 2 and then you get a sequence, you know, of number of
integers that go one after the other.
So the collapse sequence, you know, there's a very famous conjecture that says that all
sequence end up with a sub loop which is 1, 4, 2, 1, 1, 4, 2, 1 and everybody, so nobody
has been able to prove that, people are advised not to work on it except if you're a Terrain
star but apart from that you shouldn't work on it because it's too difficult, Erdos said
that mathematics is not ready for this, etc.
So I'm not interested in the collapse conjecture, I'm interested in the sequence and the sequence
has an amusing properties.
Suppose you start with an odd number of the form 2 to the k m minus 1 where m is odd,
so k is the largest power of 2 that divides, is the number of zeros at the end of the binary
representation of m plus 1 or the number of ones at the end of the binary representation
of the number, right?
If you have such a number after exactly 2k collapse steps, the number 2 to the k m minus
1 is transformed into 3 to the k m minus 1 which is an even number because m is odd and
then you have a number of k prime, it depends on m obviously, down steps, okay?
So I'm defining on the collapse sequence something I call long collapse steps which are these
transforms, I start with an odd number, I multiply it by 3n, I do n to 3n plus 1 then
Presenters
Dr. François Charton
Zugänglich über
Offener Zugang
Dauer
00:25:17 Min
Aufnahmedatum
2025-06-24
Hochgeladen am
2025-06-25 07:17:24
Sprache
en-US