33 - Deep Learning - Plain Version 2020 [ID:21167]
50 von 93 angezeigt

Welcome back to deep learning and today I want to show you one alternative solution to solve this

vanishing gradient problem in recurrent neural networks. You already noticed long

temporal contexts we will talk about long short-term memory units LSTMs and

they have been introduced by a Hochreiter and Schmidhuber and they were

published in 1997 and they were designed to solve this vanishing gradient problem

in the long-term dependencies and the main idea is that you introduce gates

that control writing and accessing the memory in additional state cells. So

let's have a look into the LSTM unit. You see here one main feature is that we now

have essentially two things that could be considered as a hidden state. We have

the cell state C and we have the hidden state H. Again we have some input X then

we have quite a few of activation functions and we somehow combine them

and in the end we produce some output Yt but this unit is much more complex than

what we've seen previously in the Elman cell. Okay so what are the main features?

The LSTM gets some input Xt then it produces a hidden state. It also has the

cell state that we'll look into a little more detail in the next couple of slides

to produce the output Yt. Now we have several gates and the gates essentially

are used to control the flow of information. There's a forget gate and

this is used to forget old information in the cell state. Then we have the

input gate and this is essentially deciding new input into the cell state

and from this we then compute the updated cell state and the updated hidden

state. So let's look into the workflow. We have the cell state after each time

point T and the cell state undergoes only linear changes so there's no

activation function. You see there's only a multiplication and an addition on the

path of the cell state. So the cell state can flow through the unit unchanged and

the cell state can be constant for multiple time steps. Now we want to

operate on the cell state and we do that with several gates and the first one is

going to be the forget gate. The key idea here is that we want to forget

information from the cell state and in another step we then want to think about

how to actually put new information in the cell state that is going to be like

memorizing things. So the forget gate FT controls how much of the previous cell

state is forgotten and you can see FT is computed by a sigmoid function so it's

somewhere between 0 and 1 and it's essentially computed with a matrix

multiplication of a concatenation of the hidden state and XT plus some bias and

this is then multiplied to the cell state so we decide which parts of the cell

state vector to forget and which ones to keep. Now we also need to put in new

information and for the new information we have to somehow decide what

information to input into the cell state. So here we need two activation

functions one that we call IT that is also produced by a sigmoid activation

function again a matrix multiplication of the hidden state concatenated with

the input plus some bias and the sigmoid function as non-linearity remember this

value is going to be between 0 and 1 so you could argue that IT is kind of

selecting something. Now then we have some C tilde which is a kind of update

state that is produced by the tungens hyperbolicus and this then takes as

input some weight matrix WC that is multiplied to the concatenation of

hidden and input vector plus some bias C. So essentially we have this index that

is then multiplied to the intermediate cell state C tilde and we could say that

the tungens hyperbolicus is producing some new cell state and then we select

via IT which of these indices should be added to the current cell state. So we

multiply with IT the new produced CT tilde and add it to the cell state CT.

Now we update as we've just seen the complete update for the cell state is

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:09:32 Min

Aufnahmedatum

2020-10-12

Hochgeladen am

2020-10-12 19:16:20

Sprache

en-US

Deep Learning - Recurrent Neural Networks Part 3

This video discusses Long-Short-Term Memory Units.

For reminders to watch the new video follow on Twitter or LinkedIn.

Further Reading:
A gentle Introduction to Deep Learning

Einbetten
Wordpress FAU Plugin
iFrame
Teilen