5 - A New Approach for Stochastic Optimization in Deep Learning [ID:12866]

50 von 479 angezeigt

And thanks for you to come here and to listen to this partly theoretical talk.

And as mentioned, I would talk about the optimization methods in machine learning and introduce

first show what has been or what is classically done and then what we could do to improve

these schemes.

And first of all, machine learning or more clear supervised learning is we have input

data and we have output data.

So we have X as input data, Y as an output data and we try to approximate the mapping

from the input to the output.

And we have some, let's find a dimensional everything so we have a lot of data and try

to find this mapping.

And as an assumption for this talk we assume that the input data realization of a random

variable X which has a probability measure mu and this probability measure might be unknown.

And there exists of course an unknown function which maps the input data to the output data.

Of course we want to find this and we don't know it yet but we assume that this function

is Lipschitz continuous.

We can relax this condition a bit but we have to have some kind of regularity of the mapping

from the input to the output.

So the learning task of course we want to minimize some expected value of an objective

functional which somehow measures the quality of an approximation.

So for example we have a neural network and we try to approximate the mapping from input

output and we measure it for example in the L2 norm.

And of course we have an expected value which is mapping from some random variable to the

real values and we have a compact set of designs.

So U admissible is a set of designs.

In a neural network it would be the weights and the biases and all this stuff together

would be the design domain.

So minimize this expected value.

And as an integral form we minimize the integral over some objective functional.

And of course we reduce so we write down an objective function J of U which is an integral.

And the question is first of all what this somehow, yeah of course.

Why did you use the new psi but psi doesn't appear in this?

Psi yeah yeah.

So you will see later that J can depend on psi.

For example it's the difference, the L2 difference, J of U is the L2 difference here of psi and

F. That's very generic, the J of U.

Yeah?

But the output doesn't appear there either.

Yes but that's just, so any arbitrary function which maps parameters to a real number.

So it can be the input output distance or some other function.

For example we can have classical machine learning so it's a distance of the approximation

and the input output mapping.

We can also think about total different things about model based machine learning.

So psi, the neural network, could approximate the dynamics of an ODE.

And then we measure the output of the ODE at the final time minus the input output mapping.

So we can think about model based machine learning not machine learning of the whole.

So part is done by the ODE and parts are by the neural network.

And if we leave the field of machine learning we can think about for example infinite scenarios

in PDE constrained optimization.

So for example we do compliance minimization.

So we have a load vector, it's all discretized now.

Teil einer Videoserie :

AG Mathematics of Deep Learning

Presenters

Dr. Lukas Pflug

Zugänglich über

Offener Zugang

Dauer

00:31:51 Min

Aufnahmedatum

2020-02-20

Hochgeladen am

2020-02-20 21:00:33

Sprache

en-US

Tags

Per RSS abonnieren