3 - Convergence of Stochastic Gradient Descent [ID:32861]

50 von 672 angezeigt

So, welcome to this lecture on the Convergence of Stochastic Gradient Descent.

So, in the beginning, let me just repeat the slide that we had in the last lecture, where

I already told you that we've learned quite a lot in this course on neural networks.

So, we've learned that they perform great in many applications, like classification,

image and language processing, gaming, etc.

Secondly, that they are typically composed of artificial neurons, which have a very simple

form.

They're just given by the composition of a fine function involving weights and biases

with a nonlinear activation function, psi.

Yeah, then we've already heard that the three parameters, so the weights and biases, for

instance, can be computed using stochastic gradient descent of a loss function.

And in the last lecture, we've also seen that theoretically, neural networks can solve any

tasks, so meaning that they can approximate any given function.

And so, in the last lecture, we've proven universal approximation theorem for a very

specific class of neural networks, showing that they admit this approximation property.

Good.

So, but there are a couple of things that we do not know yet.

So, first of all, and this will be the topic of today's lecture, does stochastic gradient

descent converge?

So, we have seen how it is defined and how you can use it, let's say, but we haven't

theoretically spoken about it yet and investigated whether and under which condition it converges

to, let's say, a good solution of our problem.

And another open question is, for instance, are neural networks stable and if not, can

they be made stable?

This will be the topic of next week's lectures.

And yeah, of course, there are many more unanswered questions on neural networks, which is why

they are still a very hot topic in research.

All right, so they are very mysterious objects and a lot of open questions remain.

However, of course, in the course of this lecture, we can only speak about a few of

these of these important questions, let's say.

Okay, so let me recap what we heard last time.

So what we did there was we proved a universal approximation theorem, which very informally

and avoiding all mathematical details can be stated like this.

So neural networks with a discriminatory action function, activation function, and sufficiently

many neurons or layers can approximate any given function.

And so we have talked about what are discriminatory activation functions, and we have proved this

theorem here for continuous functions.

But I've also told you that you can in fact extend it to larger class of functions that

you would like to approximate.

Okay, so this is very nice from a theoretical point of view.

However, as I said last time, it is not quite clear from the proof of this theorem how you

can actually compute such a network which approximates your given function.

So this is still a remaining question.

And as we've already seen a couple of lectures ago, what you do in practice is you use an

algorithm which is called stochastic gradient descent.

And what it needs is the following.

So first, you start with a so-called training set, which is a subset of the Cartesian product

of the input space of the network and the output space.

And basically, this training set contains sample values from the function that you would

like to approximate.

Teil einer Videoserie :

Mathematical Data Science 1

Presenters

Leon Bungert

Zugänglich über

Offener Zugang

Dauer

00:46:55 Min

Aufnahmedatum

2021-05-16

Hochgeladen am

2021-05-16 19:56:55

Sprache

en-US

Einbetten

Wordpress FAU Plugin

 https://www.fau.tv/clip/id/32861

iFrame

<iframe src="https://api.video.uni-erlangen.de/services/oembed/?url=https://www.fau.tv/clip/id/32861&format=iframe&maxwidth=1280&maxheight=720" width="1280" height="720"seamless allowfullscreen style="border: 0; padding: 0; margin: 0;overflow: hidden;"></iframe>

Herunterladen

Video

Per RSS abonnieren