And thanks for you to come here and to listen to this partly theoretical talk.
And as mentioned, I would talk about the optimization methods in machine learning and introduce
first show what has been or what is classically done and then what we could do to improve
these schemes.
And first of all, machine learning or more clear supervised learning is we have input
data and we have output data.
So we have X as input data, Y as an output data and we try to approximate the mapping
from the input to the output.
And we have some, let's find a dimensional everything so we have a lot of data and try
to find this mapping.
And as an assumption for this talk we assume that the input data realization of a random
variable X which has a probability measure mu and this probability measure might be unknown.
And there exists of course an unknown function which maps the input data to the output data.
Of course we want to find this and we don't know it yet but we assume that this function
is Lipschitz continuous.
We can relax this condition a bit but we have to have some kind of regularity of the mapping
from the input to the output.
So the learning task of course we want to minimize some expected value of an objective
functional which somehow measures the quality of an approximation.
So for example we have a neural network and we try to approximate the mapping from input
output and we measure it for example in the L2 norm.
And of course we have an expected value which is mapping from some random variable to the
real values and we have a compact set of designs.
So U admissible is a set of designs.
In a neural network it would be the weights and the biases and all this stuff together
would be the design domain.
So minimize this expected value.
And as an integral form we minimize the integral over some objective functional.
And of course we reduce so we write down an objective function J of U which is an integral.
And the question is first of all what this somehow, yeah of course.
Why did you use the new psi but psi doesn't appear in this?
Psi yeah yeah.
So you will see later that J can depend on psi.
For example it's the difference, the L2 difference, J of U is the L2 difference here of psi and
F. That's very generic, the J of U.
Yeah?
But the output doesn't appear there either.
Yes but that's just, so any arbitrary function which maps parameters to a real number.
So it can be the input output distance or some other function.
For example we can have classical machine learning so it's a distance of the approximation
and the input output mapping.
We can also think about total different things about model based machine learning.
So psi, the neural network, could approximate the dynamics of an ODE.
And then we measure the output of the ODE at the final time minus the input output mapping.
So we can think about model based machine learning not machine learning of the whole.
So part is done by the ODE and parts are by the neural network.
And if we leave the field of machine learning we can think about for example infinite scenarios
in PDE constrained optimization.
So for example we do compliance minimization.
So we have a load vector, it's all discretized now.
Presenters
Dr. Lukas Pflug
Zugänglich über
Offener Zugang
Dauer
00:31:51 Min
Aufnahmedatum
2020-02-20
Hochgeladen am
2020-02-20 21:00:33
Sprache
en-US