So, welcome to this lecture on the Convergence of Stochastic Gradient Descent.
So, in the beginning, let me just repeat the slide that we had in the last lecture, where
I already told you that we've learned quite a lot in this course on neural networks.
So, we've learned that they perform great in many applications, like classification,
image and language processing, gaming, etc.
Secondly, that they are typically composed of artificial neurons, which have a very simple
form.
They're just given by the composition of a fine function involving weights and biases
with a nonlinear activation function, psi.
Yeah, then we've already heard that the three parameters, so the weights and biases, for
instance, can be computed using stochastic gradient descent of a loss function.
And in the last lecture, we've also seen that theoretically, neural networks can solve any
tasks, so meaning that they can approximate any given function.
And so, in the last lecture, we've proven universal approximation theorem for a very
specific class of neural networks, showing that they admit this approximation property.
Good.
So, but there are a couple of things that we do not know yet.
So, first of all, and this will be the topic of today's lecture, does stochastic gradient
descent converge?
So, we have seen how it is defined and how you can use it, let's say, but we haven't
theoretically spoken about it yet and investigated whether and under which condition it converges
to, let's say, a good solution of our problem.
And another open question is, for instance, are neural networks stable and if not, can
they be made stable?
This will be the topic of next week's lectures.
And yeah, of course, there are many more unanswered questions on neural networks, which is why
they are still a very hot topic in research.
All right, so they are very mysterious objects and a lot of open questions remain.
However, of course, in the course of this lecture, we can only speak about a few of
these of these important questions, let's say.
Okay, so let me recap what we heard last time.
So what we did there was we proved a universal approximation theorem, which very informally
and avoiding all mathematical details can be stated like this.
So neural networks with a discriminatory action function, activation function, and sufficiently
many neurons or layers can approximate any given function.
And so we have talked about what are discriminatory activation functions, and we have proved this
theorem here for continuous functions.
But I've also told you that you can in fact extend it to larger class of functions that
you would like to approximate.
Okay, so this is very nice from a theoretical point of view.
However, as I said last time, it is not quite clear from the proof of this theorem how you
can actually compute such a network which approximates your given function.
So this is still a remaining question.
And as we've already seen a couple of lectures ago, what you do in practice is you use an
algorithm which is called stochastic gradient descent.
And what it needs is the following.
So first, you start with a so-called training set, which is a subset of the Cartesian product
of the input space of the network and the output space.
And basically, this training set contains sample values from the function that you would
like to approximate.
Presenters
Leon Bungert
Zugänglich über
Offener Zugang
Dauer
00:46:55 Min
Aufnahmedatum
2021-05-16
Hochgeladen am
2021-05-16 19:56:55
Sprache
en-US