3 - SIGMathLing Seminar -- Abdou Youssef: Machine Learning for Math: Vision and Intermediate Milestones [ID:39854]
50 von 358 angezeigt

An ongoing conversation.

Okay. So tagging, in fact, I published a paper on that in KUKKOM a couple of years ago,

and some people I know of, some I'm collaborating with, some work on their own,

have been using my POM tag, POM stands for part of math, tagger. Kind of a corresponding thing

because it's part of speech tagging in general NLP, so I thought that would be an appropriate

kind of acronym for it. And that is after you've tokenized an equation or kind of any sequence of

math symbols, then you are now kind of determined what every token stands for. Is it an operation?

Is it a variable? Is the function? Whatever. So tagging is another fundamental task. And

again, for the sake of time, you will see in the slides here, my formatting of the data set that

would be useful for tagging. My PAM tagger was very much syntactical, but it basically is only

kind of first step at math tagging. And I left the remaining completions of it to be pretty much for

deep learning to do. And so we're beginning to do some of that. So we need data sets, therefore,

for tagging. Again, the input, the x part would be a sequence of tokens, this time math tokens,

and the outputs would be for each token, the most appropriate tag. So again, for the sake of time,

I'm not going to really spend as much time on the specifics of this. The third task is

math term disambiguation. In a way, you could view disambiguation as a sub task of tagging,

because to be able to tag mathematical tokens correctly, a lot of times the tagger is going

to encounter an ambiguity, say, oh, is this a binary operation or a unitary operation? Is this

U a variable or a function? And so on and so forth. So the ambiguity is such an important sub task

right side, we should elevate it to a task and develop also a data sets for it. So again,

I develop here a format for the kinds of samples or instances that should go into a data set for

disambiguation. And we are in fact, as we speak in the process of developing a data set like that.

Those of you attended the last Kikkun, one of my students who did his master's with me,

published his paper there and presented it. So last July, and now he's doing his PhD with me. So

he's going to be continuing to work on this. But he looked into math ambiguity and how to

disambiguate and how to use machine learning models, both classical machine learning as well as

deep learning. But, you know, being a thesis and thesis in the United States are not as extensive

as in Europe. So he had to basically take a small chunk of things he looked into disambiguation of

superscripts, because a superscript could be either a power, or it could be part of the name,

or it could be higher order differentiation, or it could be the upper bound of some summation or

integral. He also looked into disambiguating primes, because primes could stand either

for derivatives or part of the name. And he also looked for a good measure on disambiguating what

gamma is in its different contexts, and developed some data sets, labeled them, and trained different

models. And again, without spending too much time there, but just a quick snapshot of what he found.

He found what he basically trained three different classical machine learning models, decision trees,

rainforests, which are connections of decision trees, and support vector machines. These are

kind of the bread and butter, the most powerful machine learning models in the classical machine

learning. And he looked into deep learning and looked at LSTM, knowing that because the data

sets were small, that deep learning would not do that well. Again, for the sake of time, let me

quickly basically say that the three classical machine learning models gave us some good accuracy.

They were like in the 80 percent accuracy. LSTM did not do that well. The accuracy for

disambiguating these symbols, the prime, the superscript, and the gamma, was quite low.

Disappointingly low, but not surprisingly low, because again, our data sets were too small. We

need a lot more data for deep learning, in order for the deep learning models, which have a lot

more parameters than general machine learning models, the classical ones. That's why they need

more data. So we got, as I said, good results. I've jumped very quickly to some quick kind of

conclusions from his work that machine learning is certainly applicable to math disambiguation.

And because of lack of large-level data sets, we can't exploit the full potential yet.

Rather, we need to use some of the classical machine learning models that don't need as much data,

and they deliver some good performance. I mean, once you have accuracy in the 80 percent range,

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:39:14 Min

Aufnahmedatum

2022-01-10

Hochgeladen am

2022-01-10 18:16:04

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen