Welcome back everybody to Deep Learning. Thanks for tuning in. Today's topic will be the back propagation algorithm.
So you may be interested in how we actually compute these derivatives in complex neural networks.
Let's look at a simple example.
Our true function is 2x1 plus 3x2 to the power of 2 plus 3.
Now we want to evaluate the partial derivative of f of x at the position 1,3 with respect to x1.
There are two algorithms that can do that quite efficiently.
The first one will be finite differences.
The second one is an analytic derivative.
So we will go through both examples here.
For finite differences, the idea is that you compute the function value at some position x.
Then you add a very small increment h to x and evaluate the function there.
You also compute the function at f of x and compute the difference between the two.
Then you divide by the value of h.
So this is actually the definition of a derivative.
It is the limit of the difference between f of x plus h and f of x divided by h.
In which we let h approach zero.
Now the problem is this is not symmetric.
So sometimes you want to prefer a symmetric definition.
Instead of computing exactly this at x, we go h over 2 back and h over 2 to the front.
This allows us to compute the derivative exactly at the position x.
Then we still have to divide over h.
This is a symmetric definition.
We can do this for our example.
So let's try to evaluate this.
We take our original definition 2x1 plus 3x2 to the power of 2 plus 3.
We wanted to look at the position 1, 3.
Let's use the plus h over 2 definition above.
Here we set h to a small value, say 2 times 10 to the power of minus 2.
We plug it in and you can see that here in this row.
Now this is going to be 2 times 1 plus 10 to the minus 2 plus 9 to the power of 2 plus 3.
And of course we also have to subtract our small value in the second term.
Then we divide by the small value as well.
So we will end up with approximately 124.4404 minus 123.5604.
And this will approximately be 43.9999.
Now we can compute this for any function even if we don't know the definition of the function.
For example, if one has only a software module that we cannot access.
In this case we can use finite differences to approximate the partial derivative.
Practically we can use h in the range of 1 times 10 to the minus 5,
which is appropriate for floating point precision.
Depending on the precision of your computing system,
you can also determine what the appropriate value for h is going to be.
You can check that in reference number 7.
We see that this is really easy to use.
We can evaluate this on any function.
We don't need to know the formal definition, but of course it's computationally inefficient.
Imagine you want to determine the gradient and that is the set of all the partial derivatives
of a function that has a dimension of 100.
This means that you have to evaluate the function 101 times to compute this entire gradient.
So this may not be such a great choice for general optimization because it may become inefficient.
Of course it's a cool method in order to check your implementation.
Presenters
Zugänglich über
Offener Zugang
Dauer
00:19:03 Min
Aufnahmedatum
2020-10-09
Hochgeladen am
2020-10-09 10:46:19
Sprache
en-US
Deep Learning - Feedforward Networks Part 3
This video introduces the basics of the backpropagation algorithm.
For reminders to watch the new video follow on Twitter or LinkedIn.
Further Reading:
A gentle Introduction to Deep Learning