24 - Accelerated Forward-Backward Optimization using Deep Learning [ID:35911]

50 von 280 angezeigt

Oh Yeah, thank you Daniel Leon for introduction.

So my name is again yet and

I'm going to talk about accelerated forνη acquire terérizации

�

내려

I also will try to pay attention to the chat, but this practice shows I'm usually not very good at it.

So I mean, for me, it's better if you just ask your question out loud.

Okay, let's start. This is a joint work with Sebastian Banner, Jonas Adler and Ozen Okten.

And in this work, we propose two methods for solving convex optimization problems. And in these methods, we use deep learning to obtain faster convergence.

And the most important theoretical contribution is that we actually provide convergence guarantees for the methods that we propose.

And at the end of the talk, I will also show you the experiments where we applied our methods to reconstruction problem in computerized tomography.

First, let's start with the setup. We are minimizing function f that belongs to a class of functions that share similar structure.

And what do I mean by this?

Well, in inverse problems, when we are dealing with inverse problems, we often use two rational models.

And there we have some measured data y, and we want to estimate unobserved signal x from this data.

So in a rational framework, we do this by minimizing function f, where the first term is data fidelity term, which ensures that our signal corresponds to measured data.

And the second term is some regularizing component that ensures that x has certain qualities that we know beforehand.

And if you have such problem, for instance, if you have the blurring problem, so we want to blur images, then typically we want to solve this problem, not only for one case,

but we have measured data y many times, and therefore we want to solve this problem not just once.

Essentially, we want to solve different problems, but those are somehow similar.

And in this case, the problem might look quite simple, but in practice, we often deal with x and y, which are really high dimensional.

And therefore, we need some speed up.

And also, another complication is that this function s is not necessarily differentiable.

Our main goal here is to use statistical learning to exploit the common structure of these problems to obtain faster convergence.

And to do this, we propose to use deviation based schemes.

And here you see an example of this.

So this is almost a simple gradient descent scheme, but it has this deviation here, delta xn.

So this is some deviation from a regular gradient update.

And what we propose to do, we propose to learn this deviation using neural networks.

And our goal here is to get faster convergence on average given probability distribution over this function f.

But if you have worked with neural networks, you probably know that they work, they usually work very well when you are dealing with samples that are somehow similar to your training set.

But as soon as you get a sample which is out of your distribution, they might behave unpredictably.

And therefore, we have another requirement.

So we want to guarantee the convergence of this method for any f, even the one which might appear with low probability.

And in this case, it will mean that we might not get as fast convergence for that unusual f.

But and the method will take just more iterations to converge, but it is still guaranteed to converge.

Next, we consider two different cases.

First is a smooth case, and here we have a minimizing function f, which is convex and differentiable.

And it also has a continuous gradient.

And another case is a non-smooth case.

And here we are dealing with a function which consists of two terms.

First is smooth, differentiable, and the continuous gradient.

And the second g is proper convex and lower semi-continuous function.

But it is not necessarily differentiable.

How these problems are typically addressed?

For this smooth case, the most basic method to use is gradient descent.

And it has convergence rate 1 over n, where n is number of iterations.

And there is an accelerated approach called Nesterov's scheme.

And it utilizes some momentum in the sequence, and it obtains convergence 1 divided by n squared.

Similarly, for the non-smooth case, there is a basic scheme called proximal gradient descent.

Teil einer Videoserie :

AG Mathematics of Deep Learning

Zugänglich über

Offener Zugang

Dauer

00:46:52 Min

Aufnahmedatum

2021-07-26

Hochgeladen am

2021-07-26 14:36:07

Sprache

en-US

Jevgenija Rudzusika (KTH Stockholm) on Accelerated Forward-Backward Optimization using Deep Learning:

We propose several deep-learning accelerated optimization solvers with convergence guarantees. We use ideas from the analysis of accelerated forward-backward schemes like FISTA, but instead of the classical approach of proving convergence for a choice of parameters, such as a step-size, we show convergence whenever the update is chosen in a specific set. Rather than picking a point in this set using some predefined method, we train a deep neural network to pick the best update. Finally, we show that the method is applicable to several cases of smooth and non-smooth optimization and show superior results to established accelerated solvers.

Jevgenija Rudzusika (KTH Stockholm) on Accelerated Forward-Backward Optimization using Deep Learning:

Tags

Per RSS abonnieren