Hi. Now that we have laid the groundwork for how the conjugate gradient descent basically
works, we have to still fill in a few details and the most important detail is how do we
obtain a set of conjugate directions di, i from 0 to n-1. So one idea might be to do
a modification of the Gram-Schmidt algorithm which
works such a set, but the computational complexity
is equal to applying Gaussian eliminations in order to solve bx is equal to b. So why
is this relevant? So if you think back to the motivation of this algorithm, this is
an algorithm, so what we are trying to do is we want to either solve this linear equation,
which is for example the normal equation of an inverse problem, or we would like to solve
a quadratic equation which is equivalent to this problem. And in the beginning we said
that we can of course solve this exactly by applying Gaussian eliminations, but this is
too costly. So if we were to use the Gram-Schmidt algorithm, which constructs us a set of conjugate
vectors, then we would have lost all the motivation to do such an iterative algorithm because
at the end we are doing something of equal complexity. So we have to find something which
is quicker than Gaussian eliminations, or we could just solve for the solution exactly.
So that doesn't make a lot of sense. And the main idea here is to instead do not compute
this set di in advance, because this will always be as difficult as solving this equation.
Rather, what we want to do is pick a good starting direction, and the best guess is
picking d0 equal to the negative gradient of phi in x0. That's actually the only thing
that makes sense to do, because we don't have anything else to start with. And then after
one step, so for this direction we know the optimal step alpha, 0 of course, is identical
to the gradient descent. Well, it doesn't really matter. We can use our optimal alpha
value, which we have computed last time. So this one step would be x1 is equal to x0 plus
alpha 0 d0. And now note that r1 is orthogonal to r0. So that was something we proved in
the last lecture, if I'm not mistaken. So if we do this, then, let me go back one slide.
Right, so the first step, our direction d0 is equal to r0. So let's keep in mind that
this is the same thing as r0. And I'm claiming that the next residual is orthogonal to d0,
which is r0. And that is a consequence of this lemma 44, well, actually lemma 43, because
ri plus plus 1 is orthogonal to di, which is in our case, which means that r1 is orthogonal
to r0, or d0. Okay, so that's true by lemma 4.3. And now pick a tentative direction d1
tilde, which we will not take, but this is our first guess. This is r1, which is equal
to the negative gradient of phi in x1. So if you were to pick this d1, we would degrade
the set, which is not optimal. So we have to modify this direction, and then compute
d1 from d1 tilde, such that d1 is b orthogonal to r0, which is the same thing as d0. And
how do we do this? By setting d1 is equal to d1 tilde plus beta 1 times d0. So this
is the basic orthogonalization ansatz. We now have to pick the right beta 1, such that
d1 is b orthogonal to d0. And the correct constant here is setting b1 equal to, well,
you can do the computation, it's r1 transpose times r1 divided by r0 transpose times r0.
So check this to be correct. This is something you can do by hand. And now d1 is b orthogonal
to d0. Okay. The next iteration, we again pick a candidate for a direction, d2 tilde,
defined as r2, which is the negative gradient of phi of x2. This again is a bad direction,
which we have to choose. And by lemma 4.4, we know that r2 is b orthogonal to d0. This
is d2 tilde. And now we set d2 as d2 tilde plus beta 2 times d1, where beta 2 is r2 transpose
r2 divided by r1 transpose r1. And then d2 is b orthogonal to d0 and to d1. So maybe
you now get the basic idea here. So you take this r2, then you modify it until it is b
orthogonal with respect to the previous direction. So before it was just b orthogonal to the
penultimate direction. Then you modify it such that it is also b orthogonal with respect
to the previous direction. And you repeat that again and again. So you have in the ith
iteration, you have di plus 1 tilde, which is equal, you define this as ri plus 1. You
know that di plus 1 tilde is b orthogonal to all the previous ones up to di minus 1.
Presenters
Zugänglich über
Offener Zugang
Dauer
00:28:23 Min
Aufnahmedatum
2021-12-14
Hochgeladen am
2021-12-14 13:46:06
Sprache
en-US