28 - Artificial Intelligence II [ID:47319]
50 von 511 angezeigt

Okay. Welcome to AI2 last lecture. What I want to do today is to continue a bit the,

the lecture on deep learning methods in LLP and then kind of wrap up what we've done. And I want to

answer any questions you have. So we've looked at deep learning methods. The first topic was

basically word embeddings. The idea is that national language processing is about word sequences. What

neural networks can do is number of vectors and somehow we have to go from one to the other. And

this very, very, very, very simple idea of these one-hot, very high dimensional vectors for language is

exactly that, a very simple idea, not very helpful. So word embeddings are kind of the universal first

step to get from, from word sequences to good embeddings. And there's various, there's various

ideas here and the one that's currently most favored is one that already tries to capture some,

I would say, lexical knowledge, systematic relations between man and woman, king and queen,

for instance, but also various other relations in subtractions or so differences of vectors. The

nice thing about a vector space is we can actually compute, we can actually add, subtract,

dot product and all of those kind of things, the things that actually power neural networks. And

if the word embeddings already add or learn the necessary structure from the data they're given

to train them, then these will actually be available later in the neural networks.

The current, the current, the word to vec algorithm, there are more algorithms right now.

The first really one is essentially by doing using a network where you actually force

kind of to force the information to pass through a limited keyhole, that gives you kind of a

lured compression, and that gives you a nice way of in the hidden layer to make embeddings.

Right. And this is how the network essentially works later.

Okay. There are a couple of pre-trained networks that you can just use off the shelf.

They're usually kind of derived from internet data. And there are also word embeddings for

various languages, but also for special domains. A PhD student of mine

is also using computer to glove vectors for mathematical language, or scientific language,

from a particular corpus, in this case, the archive.org corpus, which is 2 million

preprints of scientific papers, which is nice because in many areas like

astrophysics or so, it's essentially complete. That gives you a very good

overview of scientific English. Possibly scientific, pigeon, but those things tend to exist.

And we looked at basically, we looked at a way of kind of learning the embeddings together

with a task, which is kind of the from-first principles approach. But here you'll remember that

we had the embedding matrix in these components as weight vectors. You can also just basically take

vertevec or glove vectors there as a pre-trained, as a pre-trained, word embedding, make it better for

your own purposes. I'll come back to transfer learning, which is what this is a little bit later today.

The second thing that we looked at was that if we have more complex

NLP tasks that need more context, we need to step up from feed forward networks to recurrent networks.

And the idea there in recurrent networks is that you have this kind of memory loop,

which we can understand as something that it's a network for learning time series. And of course,

word series are time series. So you feed information that's the cycle here. That's what you see

here, the WZZ. That feeds to kind of the next time step. It's a way of remembering stuff. We

are remembering this as a data that's kind of stored in the loop for one time. There are various

backpropagation algorithms. You can kind of use these RNN blocks, things, little things like that

to make up bigger networks. And literally that is what happens in research now. People are

building kind of in these, I would say, block architectures, where the components are relatively

standard, well understood, things like simple RNNs, or we looked at LSTMs, or attention blocks,

or all of those kind of things. You build these networks, and you can already see that there's

quite, I mean, for every one of these, they look like that. And here are three weight matrices

in there, depending if we always have kind of a thousand connections to get the right memory

bandwidths if you want. Then we're seeing something like three million parameters here. Just in one

RNN, and if you start counting those, then you'll get into many millions of parameters that can be

learned. And kind of commercial grade networks like the GTP-4, GPT-4 network, they actually have

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:32:09 Min

Aufnahmedatum

2023-07-19

Hochgeladen am

2023-07-19 16:39:03

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen