Okay, so good morning to all of you again.
The goal of the lecture today, continuing with what we started yesterday, was to try
to explain how one can get this traditional result or results alike on the universal approximation
that, as you may remember, in one of the first lectures we gave a proof based on Hambanach
theorem. This was actually the original proof of Sibenko on the universal approximation,
the fact that a combination of sigmoids translated, dilated, scaled, were able to generate an
approximation of any function. You recall that it was given out of Hambanach, but the
goal here is to explain how it can be achieved in a similar manner, in a different manner,
with a similar conclusion in the context of control systems. As we said yesterday, the
different feature of the neutral differential equations we are considering these days is
that the nonlinearity of this neutral differential equation will be given by these kind of sigmoid
functions, which are not very typical in the context of mechanical systems. Where we encounter
rather polynomials, trigonometric functions and so on. We will fix our attention in particular to
this sigmoid function, which is the global ellipsis, but is not as smooth. As we said,
the most prototypical, the simplest problem we could consider is that in which we are simply
trying to classify data. We said, well, we can reformulate that as a simultaneous control
problem. Why do we say this is a simultaneous control problem? Well, we said, okay, let us
rather than consider one layer, say, neural network as in the original work of Sibenko,
let us consider a neural network with multiple layers and let us do it in an incremental manner.
This is what is called the residual neural network, in which we are moving from our
configuration of the data into a different configuration of the data. From k to k plus
one, we are generating a discrete dynamical system that out of the initial configuration
is mapping the data into a new one in which we expect that they will get classified properly
according to the labels. So we generate a discrete dynamical system in which when h is a small
parameter, we are simply making a small variation of the identity operator out of the sigmoid
function. But because we are allowed to choose the parameters a, b, and w, as we said,
we are enjoying all the possibilities that the sigmoid function allows in the sense that it's
able to freeze half of the space while moving the other one linearly in the case of the ReLU,
moving it linearly in the direction we wish. Then in order to establish even a more clear link with
the theory of differential equations, the dynamics of differential equations and the control of
differential equations, we said if in this time discrete neural network we assume that h is small,
then we are close to the regime of the neural differential equations written here.
So in which you see now that this is a classical non-autonomous scotche problem. So this is a
non-autonomous differential equation because the non-linearity not only depends on the state x,
but also depends on the time variable t. The sigma even when we consider the red line,
when we consider the red loop is a Leipzig's function. So there is no problem in terms of the
application of the Cochille-Leipzig theorem for the existence and uniqueness of a solution for
the Cochille problem. And contrary to the linear case, where we look to the classical
controllability problem in which you give me an initial data, you give me a target, and then
I am supposed to build the control going from one to the other. But in a way that the control
will change whenever you change the initial data and target. Here we are facing a huge
simultaneous control problem in the sense that I am supposed to build these differential equations
so that whenever I take the capital N different initial data
that are to be classified, I consider them as being initial data of this
neural differential equation. The control has to achieve the simultaneous goal of driving each of
them to the corresponding destination. And as I said, in the context of the neural differential
equations, where the time discrete dynamics becomes a time continuous dynamics, now the controls
bt, at, and wt depend on time continuously. So you know that in particularly we consider
controls which are in L1, in L2, in L infinity with respect to time. As I said before,
because sigma is globally leaps, this problem will be well posed. And this is the point of view that
Presenters
Zugänglich über
Offener Zugang
Dauer
02:50:06 Min
Aufnahmedatum
2024-07-07
Hochgeladen am
2024-08-07 23:33:50
Sprache
en-US
S07: Controllability (2)
Date: July 2024
Course: Control and Machine Learning
Lecturer: Prof. Enrique Zuazua
_
Check all details at: https://dcn.nat.fau.eu/course-control-machine-learning-zuazua/
TOPICS
S01: Introduction to Control Theory
S02: Introduction: Calculus of Variations, Controllability and Optimal Design
S03: Introduction: Optimization and Perpectives
S04: Finite-dimensional Control Systems (1)
S05: Finite-dimensional Control Systems (2) and Gradient-descent methods (1)
S06: Gradient-descent methods (2), Duality algorithms, and Controllability (1)
S07: Controllability (2)
S08: Neural transport equations and infinite-dimensional control systems
S09: Wave equation control systems
S10: Momentum Neural ODE and Wave equation with viscous damping
S11: Heat and wave equations: Control systems and Turnpike principle (1)
S12: Turnpike principle (2), Deep Neural and Collective-dynamics
_
Check all details at: https://dcn.nat.fau.eu/course-control-machine-learning-zuazua/