47 - Lecture_11_1_Conjugate_Gradient_Descent [ID:39340]
50 von 162 angezeigt

Hi. Now that we have laid the groundwork for how the conjugate gradient descent basically

works, we have to still fill in a few details and the most important detail is how do we

obtain a set of conjugate directions di, i from 0 to n-1. So one idea might be to do

a modification of the Gram-Schmidt algorithm which

works such a set, but the computational complexity

is equal to applying Gaussian eliminations in order to solve bx is equal to b. So why

is this relevant? So if you think back to the motivation of this algorithm, this is

an algorithm, so what we are trying to do is we want to either solve this linear equation,

which is for example the normal equation of an inverse problem, or we would like to solve

a quadratic equation which is equivalent to this problem. And in the beginning we said

that we can of course solve this exactly by applying Gaussian eliminations, but this is

too costly. So if we were to use the Gram-Schmidt algorithm, which constructs us a set of conjugate

vectors, then we would have lost all the motivation to do such an iterative algorithm because

at the end we are doing something of equal complexity. So we have to find something which

is quicker than Gaussian eliminations, or we could just solve for the solution exactly.

So that doesn't make a lot of sense. And the main idea here is to instead do not compute

this set di in advance, because this will always be as difficult as solving this equation.

Rather, what we want to do is pick a good starting direction, and the best guess is

picking d0 equal to the negative gradient of phi in x0. That's actually the only thing

that makes sense to do, because we don't have anything else to start with. And then after

one step, so for this direction we know the optimal step alpha, 0 of course, is identical

to the gradient descent. Well, it doesn't really matter. We can use our optimal alpha

value, which we have computed last time. So this one step would be x1 is equal to x0 plus

alpha 0 d0. And now note that r1 is orthogonal to r0. So that was something we proved in

the last lecture, if I'm not mistaken. So if we do this, then, let me go back one slide.

Right, so the first step, our direction d0 is equal to r0. So let's keep in mind that

this is the same thing as r0. And I'm claiming that the next residual is orthogonal to d0,

which is r0. And that is a consequence of this lemma 44, well, actually lemma 43, because

ri plus plus 1 is orthogonal to di, which is in our case, which means that r1 is orthogonal

to r0, or d0. Okay, so that's true by lemma 4.3. And now pick a tentative direction d1

tilde, which we will not take, but this is our first guess. This is r1, which is equal

to the negative gradient of phi in x1. So if you were to pick this d1, we would degrade

the set, which is not optimal. So we have to modify this direction, and then compute

d1 from d1 tilde, such that d1 is b orthogonal to r0, which is the same thing as d0. And

how do we do this? By setting d1 is equal to d1 tilde plus beta 1 times d0. So this

is the basic orthogonalization ansatz. We now have to pick the right beta 1, such that

d1 is b orthogonal to d0. And the correct constant here is setting b1 equal to, well,

you can do the computation, it's r1 transpose times r1 divided by r0 transpose times r0.

So check this to be correct. This is something you can do by hand. And now d1 is b orthogonal

to d0. Okay. The next iteration, we again pick a candidate for a direction, d2 tilde,

defined as r2, which is the negative gradient of phi of x2. This again is a bad direction,

which we have to choose. And by lemma 4.4, we know that r2 is b orthogonal to d0. This

is d2 tilde. And now we set d2 as d2 tilde plus beta 2 times d1, where beta 2 is r2 transpose

r2 divided by r1 transpose r1. And then d2 is b orthogonal to d0 and to d1. So maybe

you now get the basic idea here. So you take this r2, then you modify it until it is b

orthogonal with respect to the previous direction. So before it was just b orthogonal to the

penultimate direction. Then you modify it such that it is also b orthogonal with respect

to the previous direction. And you repeat that again and again. So you have in the ith

iteration, you have di plus 1 tilde, which is equal, you define this as ri plus 1. You

know that di plus 1 tilde is b orthogonal to all the previous ones up to di minus 1.

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:28:23 Min

Aufnahmedatum

2021-12-14

Hochgeladen am

2021-12-14 13:46:06

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen