25 - Artificial Intelligence II [ID:52597]
50 von 658 angezeigt

Good morning everybody.

Last lecture, are you excited?

The only correct answer.

Okay, then let's quickly recap what we talked about Tuesday so we can get to the interesting

stuff.

And neural networks, the idea is pretty simple.

We just have one layer which basically generates two outputs, the one that we're actually interested

in and a hidden vector which we concatenate with the next input and just feed back in.

We have a combination of the actual input X and the previous hidden layer of this particular

layer ZT minus one.

If we want to train that, we basically just unroll the network over the entire sequence

that we fed in as an input and then do back propagation through the whole thing.

The annoying thing obviously is that that means a priori that an RNN can only look into

the past of the sequence that we feed in.

So if we wanted to be able to also make choices based on future input for a particular sequence

that we're interested in, the typical solution for that is we just use bidirectional RNNs

which is just a fancy way of saying we take two RNNs, one that goes forward in time, one

that runs backward in time, and at every step concatenate the outputs of both of them for

that particular input.

That gives us bidirectional RNNs.

Now we still have this slightly annoying problem of the vanishing gradients which is just a

fancy term to say if I do back propagation through sufficiently many layers, the gradients

become smaller and smaller and smaller and at some point the learning effect massively

diminishes in the earlier layers of the network.

And if I have an RNN which I unroll over an entire sequence, conceptually for back propagation

that just means I have a whole bunch of layers and the gradients vanish through the back

propagation process.

So one possible solution for that is LSTM's, where LSTM's is just a fancy technique of

basically replacing the simple hidden vector which I use recursively at every particular

time step, I use something smarter, namely this new vector z, where z instead of just

being multiplied the way that neural networks usually work, we do something additive to

avoid that the gradients vanish.

Conceptually the way that that works is I have these three gates, quote unquote, the

forget gate, the input gate and the output gate that basically govern which components

of the current state vector do I keep, which ones do I drop, which ones do I modify based

on the current time step and so on and so forth.

Then sequence to sequence models, the typical application which is very instructive when

we just want to understand how these things work is neural machine translation, i.e. feed

in a sentence in one language, expect to get the same sentence in a different language

out, but of course conceptually the whole thing works for any kind of problem where

I need to convert some input sequence into some output sequence where I don't have a

clear one to one correspondence between the individual elements of the input sequence

and the output sequence, such as neural machine translation.

And here one nice technique is the encoder decoder architecture, which is I take two

LSTMs, i.e. two recurrent neural networks, one of which only serves to encode the input

sequence that gives me, after I fed the entire sequence in, some hidden vector and then I

take a different LSTM model as the decoder which gets fed the hidden layer of the input

network and then generates output steps until I have the sequence that I'm actually interested

in.

So it's basically just a concatenation of two LSTM networks, one that takes care of

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:15:51 Min

Aufnahmedatum

2024-07-18

Hochgeladen am

2024-07-18 22:39:06

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen