Okay. Welcome to AI2 last lecture. What I want to do today is to continue a bit the,
the lecture on deep learning methods in LLP and then kind of wrap up what we've done. And I want to
answer any questions you have. So we've looked at deep learning methods. The first topic was
basically word embeddings. The idea is that national language processing is about word sequences. What
neural networks can do is number of vectors and somehow we have to go from one to the other. And
this very, very, very, very simple idea of these one-hot, very high dimensional vectors for language is
exactly that, a very simple idea, not very helpful. So word embeddings are kind of the universal first
step to get from, from word sequences to good embeddings. And there's various, there's various
ideas here and the one that's currently most favored is one that already tries to capture some,
I would say, lexical knowledge, systematic relations between man and woman, king and queen,
for instance, but also various other relations in subtractions or so differences of vectors. The
nice thing about a vector space is we can actually compute, we can actually add, subtract,
dot product and all of those kind of things, the things that actually power neural networks. And
if the word embeddings already add or learn the necessary structure from the data they're given
to train them, then these will actually be available later in the neural networks.
The current, the current, the word to vec algorithm, there are more algorithms right now.
The first really one is essentially by doing using a network where you actually force
kind of to force the information to pass through a limited keyhole, that gives you kind of a
lured compression, and that gives you a nice way of in the hidden layer to make embeddings.
Right. And this is how the network essentially works later.
Okay. There are a couple of pre-trained networks that you can just use off the shelf.
They're usually kind of derived from internet data. And there are also word embeddings for
various languages, but also for special domains. A PhD student of mine
is also using computer to glove vectors for mathematical language, or scientific language,
from a particular corpus, in this case, the archive.org corpus, which is 2 million
preprints of scientific papers, which is nice because in many areas like
astrophysics or so, it's essentially complete. That gives you a very good
overview of scientific English. Possibly scientific, pigeon, but those things tend to exist.
And we looked at basically, we looked at a way of kind of learning the embeddings together
with a task, which is kind of the from-first principles approach. But here you'll remember that
we had the embedding matrix in these components as weight vectors. You can also just basically take
vertevec or glove vectors there as a pre-trained, as a pre-trained, word embedding, make it better for
your own purposes. I'll come back to transfer learning, which is what this is a little bit later today.
The second thing that we looked at was that if we have more complex
NLP tasks that need more context, we need to step up from feed forward networks to recurrent networks.
And the idea there in recurrent networks is that you have this kind of memory loop,
which we can understand as something that it's a network for learning time series. And of course,
word series are time series. So you feed information that's the cycle here. That's what you see
here, the WZZ. That feeds to kind of the next time step. It's a way of remembering stuff. We
are remembering this as a data that's kind of stored in the loop for one time. There are various
backpropagation algorithms. You can kind of use these RNN blocks, things, little things like that
to make up bigger networks. And literally that is what happens in research now. People are
building kind of in these, I would say, block architectures, where the components are relatively
standard, well understood, things like simple RNNs, or we looked at LSTMs, or attention blocks,
or all of those kind of things. You build these networks, and you can already see that there's
quite, I mean, for every one of these, they look like that. And here are three weight matrices
in there, depending if we always have kind of a thousand connections to get the right memory
bandwidths if you want. Then we're seeing something like three million parameters here. Just in one
RNN, and if you start counting those, then you'll get into many millions of parameters that can be
learned. And kind of commercial grade networks like the GTP-4, GPT-4 network, they actually have
Presenters
Zugänglich über
Offener Zugang
Dauer
01:32:09 Min
Aufnahmedatum
2023-07-19
Hochgeladen am
2023-07-19 16:39:03
Sprache
en-US