An ongoing conversation.
Okay. So tagging, in fact, I published a paper on that in KUKKOM a couple of years ago,
and some people I know of, some I'm collaborating with, some work on their own,
have been using my POM tag, POM stands for part of math, tagger. Kind of a corresponding thing
because it's part of speech tagging in general NLP, so I thought that would be an appropriate
kind of acronym for it. And that is after you've tokenized an equation or kind of any sequence of
math symbols, then you are now kind of determined what every token stands for. Is it an operation?
Is it a variable? Is the function? Whatever. So tagging is another fundamental task. And
again, for the sake of time, you will see in the slides here, my formatting of the data set that
would be useful for tagging. My PAM tagger was very much syntactical, but it basically is only
kind of first step at math tagging. And I left the remaining completions of it to be pretty much for
deep learning to do. And so we're beginning to do some of that. So we need data sets, therefore,
for tagging. Again, the input, the x part would be a sequence of tokens, this time math tokens,
and the outputs would be for each token, the most appropriate tag. So again, for the sake of time,
I'm not going to really spend as much time on the specifics of this. The third task is
math term disambiguation. In a way, you could view disambiguation as a sub task of tagging,
because to be able to tag mathematical tokens correctly, a lot of times the tagger is going
to encounter an ambiguity, say, oh, is this a binary operation or a unitary operation? Is this
U a variable or a function? And so on and so forth. So the ambiguity is such an important sub task
right side, we should elevate it to a task and develop also a data sets for it. So again,
I develop here a format for the kinds of samples or instances that should go into a data set for
disambiguation. And we are in fact, as we speak in the process of developing a data set like that.
Those of you attended the last Kikkun, one of my students who did his master's with me,
published his paper there and presented it. So last July, and now he's doing his PhD with me. So
he's going to be continuing to work on this. But he looked into math ambiguity and how to
disambiguate and how to use machine learning models, both classical machine learning as well as
deep learning. But, you know, being a thesis and thesis in the United States are not as extensive
as in Europe. So he had to basically take a small chunk of things he looked into disambiguation of
superscripts, because a superscript could be either a power, or it could be part of the name,
or it could be higher order differentiation, or it could be the upper bound of some summation or
integral. He also looked into disambiguating primes, because primes could stand either
for derivatives or part of the name. And he also looked for a good measure on disambiguating what
gamma is in its different contexts, and developed some data sets, labeled them, and trained different
models. And again, without spending too much time there, but just a quick snapshot of what he found.
He found what he basically trained three different classical machine learning models, decision trees,
rainforests, which are connections of decision trees, and support vector machines. These are
kind of the bread and butter, the most powerful machine learning models in the classical machine
learning. And he looked into deep learning and looked at LSTM, knowing that because the data
sets were small, that deep learning would not do that well. Again, for the sake of time, let me
quickly basically say that the three classical machine learning models gave us some good accuracy.
They were like in the 80 percent accuracy. LSTM did not do that well. The accuracy for
disambiguating these symbols, the prime, the superscript, and the gamma, was quite low.
Disappointingly low, but not surprisingly low, because again, our data sets were too small. We
need a lot more data for deep learning, in order for the deep learning models, which have a lot
more parameters than general machine learning models, the classical ones. That's why they need
more data. So we got, as I said, good results. I've jumped very quickly to some quick kind of
conclusions from his work that machine learning is certainly applicable to math disambiguation.
And because of lack of large-level data sets, we can't exploit the full potential yet.
Rather, we need to use some of the classical machine learning models that don't need as much data,
and they deliver some good performance. I mean, once you have accuracy in the 80 percent range,
Presenters
Zugänglich über
Offener Zugang
Dauer
00:39:14 Min
Aufnahmedatum
2022-01-10
Hochgeladen am
2022-01-10 18:16:04
Sprache
en-US