Welcome back to deep learning and today I want to show you one alternative solution to solve this
vanishing gradient problem in recurrent neural networks. You already noticed long
temporal contexts we will talk about long short-term memory units LSTMs and
they have been introduced by a Hochreiter and Schmidhuber and they were
published in 1997 and they were designed to solve this vanishing gradient problem
in the long-term dependencies and the main idea is that you introduce gates
that control writing and accessing the memory in additional state cells. So
let's have a look into the LSTM unit. You see here one main feature is that we now
have essentially two things that could be considered as a hidden state. We have
the cell state C and we have the hidden state H. Again we have some input X then
we have quite a few of activation functions and we somehow combine them
and in the end we produce some output Yt but this unit is much more complex than
what we've seen previously in the Elman cell. Okay so what are the main features?
The LSTM gets some input Xt then it produces a hidden state. It also has the
cell state that we'll look into a little more detail in the next couple of slides
to produce the output Yt. Now we have several gates and the gates essentially
are used to control the flow of information. There's a forget gate and
this is used to forget old information in the cell state. Then we have the
input gate and this is essentially deciding new input into the cell state
and from this we then compute the updated cell state and the updated hidden
state. So let's look into the workflow. We have the cell state after each time
point T and the cell state undergoes only linear changes so there's no
activation function. You see there's only a multiplication and an addition on the
path of the cell state. So the cell state can flow through the unit unchanged and
the cell state can be constant for multiple time steps. Now we want to
operate on the cell state and we do that with several gates and the first one is
going to be the forget gate. The key idea here is that we want to forget
information from the cell state and in another step we then want to think about
how to actually put new information in the cell state that is going to be like
memorizing things. So the forget gate FT controls how much of the previous cell
state is forgotten and you can see FT is computed by a sigmoid function so it's
somewhere between 0 and 1 and it's essentially computed with a matrix
multiplication of a concatenation of the hidden state and XT plus some bias and
this is then multiplied to the cell state so we decide which parts of the cell
state vector to forget and which ones to keep. Now we also need to put in new
information and for the new information we have to somehow decide what
information to input into the cell state. So here we need two activation
functions one that we call IT that is also produced by a sigmoid activation
function again a matrix multiplication of the hidden state concatenated with
the input plus some bias and the sigmoid function as non-linearity remember this
value is going to be between 0 and 1 so you could argue that IT is kind of
selecting something. Now then we have some C tilde which is a kind of update
state that is produced by the tungens hyperbolicus and this then takes as
input some weight matrix WC that is multiplied to the concatenation of
hidden and input vector plus some bias C. So essentially we have this index that
is then multiplied to the intermediate cell state C tilde and we could say that
the tungens hyperbolicus is producing some new cell state and then we select
via IT which of these indices should be added to the current cell state. So we
multiply with IT the new produced CT tilde and add it to the cell state CT.
Now we update as we've just seen the complete update for the cell state is
Presenters
Zugänglich über
Offener Zugang
Dauer
00:09:32 Min
Aufnahmedatum
2020-10-12
Hochgeladen am
2020-10-12 19:16:20
Sprache
en-US
Deep Learning - Recurrent Neural Networks Part 3
This video discusses Long-Short-Term Memory Units.
For reminders to watch the new video follow on Twitter or LinkedIn.
Further Reading:
A gentle Introduction to Deep Learning