Welcome everybody to our next video on deep learning.
So today we want to talk about again feed forward networks, fourth part, and the main
focus today will be layer abstraction.
Of course we talked about those neurons and individual nodes, but this grows really complex
for larger networks.
So we want to introduce this layer concept also in our computation of the gradients.
So yeah, this is really useful because we can then talk directly about gradients on
entire layers and don't need to go towards all of the different nodes.
So how do we express this?
And let's recall what our single neuron is doing.
The single neuron is computing essentially an inner product of its weight.
And by the way, we are expanding now, we are skipping over this bias notation.
So we are expanding this vector by one additional element, the X vector by one additional element
that is one, and this allows us to describe the bias also in the inner product that is
shown on this slide here.
So let's magnify this a bit that you can read the formulas better.
And this is really nice because then you can see that the output prediction Y hat is just
an inner product.
Now let's think about the case that we have M neurons, which means that we get some Y
hat of M and all of them are inner products.
So if you bring this into a vector notation, the vector space representation, summing up
the input from all sensors, that doesn't does not show any pictures, but it shows you can
see that the vector Y hat is nothing else than a matrix multiplication of X with this
matrix W. And you see that a fully connected layer is nothing else than a matrix multiplication.
Of course, we are building on all these great abstractions that people have invented over
the millennia, such as matrix multiplications.
So we can essentially represent arbitrary connections and topologies using this in the
fully connected layer.
And then we also apply a point-wise non-linearity such that we really get this non-linear effect
here.
Now the nice thing about the matrix notation is of course that we can describe now the
entire layer derivative using matrix calculus.
So our fully connected layer would then get the following configuration.
Let's consider three elements of the input.
Then they have for every neuron, let's say we have two neurons, then we get a weight
vector, we multiply the two, and then the forward pass, we simply have determined this
Y hat.
For this module, if we want to compute the gradients, then we need exactly two gradients
and it's the same gradients as we already mentioned.
We need the gradient with respect to the weights, that's going to be partial derivative with
respect to W, and the partial derivative with respect to X for the back propagation to pass
it on to the next module.
So how does this evolve?
Well, we have the layer that is Y hat equals to W X, so there's a matrix multiplication
and the forward pass, then the gradient with respect to the weights.
And now you can see that what we essentially need to do is we need a matrix derivative
here and the derivative of Y hat with respect to W is going to be simply X transpose.
So if we have the loss that comes in into our module, the update to our weight is going
to be this loss vector multiplied X transpose.
So we have some loss vector and X transpose, which essentially means that you have two
Presenters
Zugänglich über
Offener Zugang
Dauer
00:17:42 Min
Aufnahmedatum
2020-04-18
Hochgeladen am
2020-04-19 01:16:09
Sprache
en-US
Deep Learning - Feedforward Networks Part 4
This video explains backpropagation at the level of layer abstraction.
Video References:
Lex Fridman's Channel
References
[1] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classification. John Wiley and Sons, inc., 2000.
[2] Christopher M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Secaucus, NJ, USA: Springer-Verlag New York, Inc., 2006.
[3] F. Rosenblatt. “The perceptron: A probabilistic model for information storage and organization in the brain.” In: Psychological Review 65.6 (1958), pp. 386–408.
[4] WS. McCulloch and W. Pitts. “A logical calculus of the ideas immanent in nervous activity.” In: Bulletin of mathematical biophysics 5 (1943), pp. 99–115.
[5] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. “Learning representations by back-propagating errors.” In: Nature 323 (1986), pp. 533–536.
[6] Xavier Glorot, Antoine Bordes, and Yoshua Bengio. “Deep Sparse Rectifier Neural Networks”. In: Proceedings of the Fourteenth International Conference on Artificial Intelligence Vol. 15. 2011, pp. 315–323.
[7] William H. Press, Saul A. Teukolsky, William T. Vetterling, et al. Numerical Recipes 3rd Edition: The Art of Scientific Computing. 3rd ed. New York, NY, USA: Cambridge University Press, 2007.
Further Reading:
A gentle Introduction to Deep Learning