8 - Deep Learning - Feedforward Networks Part 3 [ID:13514]

50 von 184 angezeigt

Hi, welcome everybody to deep learning. Thanks for tuning in. Today's topic will be the back

propagation algorithm. The stuff that works best is really simple.

So you may be interested in how we actually compute these derivatives in complex neural

networks. So let's look at a simple example. And this simple example here is that we want

to evaluate the following function. So our true function is 2x1 plus 3x2 to the power

of 2 plus 3. And now we want to evaluate the partial derivative of f at the position 1,3

with respect to x1. And there's two algorithms that can do that quite efficiently. And the

first one will be finite differences. The second one is an analytic difference. So we

will go through both examples here. Now for finite differences, the idea is that you compute

the function value at some position x, and then you add a very small increment h to x.

And you also compute the original function position f of x. And you take the difference

between the two and then divide by the same value of h. So this is actually the definition

of a derivative. So it's the limit between f at x plus h and f of x divided over h. And

we let h approach 0. Now the problem is this is not symmetric. So sometimes we want to

prefer a symmetric definition of this. So we instead of computing this exactly at x

and x plus h, we go half an h back and half an h to the front. So this allows us to compute

exactly at the position x. And then we still have to divide over h. So this would be a

symmetric definition. And that was my 1987 diploma thesis, which was all about that.

Now if you do that, we can do it in our example. So let's try to evaluate this. And we take

our original definition 2x1 plus 3x2 to the power of 2 plus 3. We wanted to look at the

position 1, 3. And let's just calculate this. Let's use the plus half h definition up above

here. So we set h to a small value. Let's say a small value is 2 times 10 to the power

of minus 2. And we plug it in. You can see that here in this row. So this is going to

be 2 times 1 plus half of our h plus 9 to the power of 2 plus 3. And of course, we also

have to subtract our small value in the second term. And we divide by the small value as

well. So this then lets us compute the following numbers. So we will end up with approximately

124.4404 minus 123.5604. And this will be approximately 43.999. So we can compute for

any function even if we don't know the definition of the function. If we only have it as a module

that we cannot access, we can use finite differences to approximate the partial derivative. In

practical use, we use h in the range of 1 times 10 to the minus 5, which is appropriate

for floating point position. Actually, depending on the precision of your compute system, you

can also determine what the appropriate value for h is going to be. You can check that in

reference number 7. We see that this is really easy to use. We can evaluate this on any function.

We don't need to know the formula definition. But of course, it's computationally very inefficient.

Imagine you want to determine the gradient that is the set of all partial derivatives

of a function that has a dimension of 100. This means that you have to evaluate the function

101 times in order to compute this entire gradient. So this may not be such a great

choice for general optimization because it may become very inefficient. But of course,

it's a very, very cool method to check your implementation. Imagine you implemented the

analytic version and sometimes you make mistakes. Then you can use this as a trick to check

whether your analytic derivative is correctly implemented. This is also something you will

learn in the exercises here. Really, really useful if you want to evaluate those functions.

Then I thought the most exciting thing is to solve the riddles of the universe. That

means you have to become a physicist. Now, the analytic gradient we can defer by using

a set of analytic differentiation rules. So first rule is going to be the derivative

of a constant is going to be zero. Then our operator is a linear operator, which means

we can rearrange it. If you have, for example, sums of different components, then we also

know the derivatives of monomials. If you have some x to the power of n, then the derivative

is going to be n times x to the power of n minus one. And the chain rule, so if you have

nested functions and you see the chain rule is essentially the important thing that we

Teil einer Videoserie :

Deep Learning

Presenters

Prof. Dr.-Ing. Andreas Maier

Zugänglich über

Offener Zugang

Dauer

00:22:03 Min

Aufnahmedatum

2020-04-18

Hochgeladen am

2020-04-19 00:06:09

Sprache

en-US

Deep Learning - Feedforward Networks Part 3

This video introduces the basics of the backpropagation algorithm.

Video References:
Lex Fridman's Channel
Tacoma Narrows Bridge Collapse
Credit: Stillman Fires Collection; Tacoma Fire Department (Video) - Castle Films (Sound) 1940 - This content is licensed under Creative Commons. Please visit here for more information about the license(s).

Further Reading:
A gentle Introduction to Deep Learning

Tags

Per RSS abonnieren