2 - Universal Approximation with Neural Networks [ID:32838]
50 von 590 angezeigt

So, welcome to this lecture on universal approximation with neural networks.

In the last couple of lectures, we've already learned a lot on neural networks, in particular

where they are applied and how one can train them.

As we have seen, neural networks perform great in different tasks, like for instance classification

or image processing, natural language processing, but also gaming, etc.

And this explains why they are so popular nowadays, because they sort of surpass classical

methods which have been applied to these tasks before.

So, we've already seen that neural networks are composed of artificial neurons, that's

why they are called neural networks.

And such a neuron has a simple mathematical form, let's call it phi of x, where x is the

input and it's given by some function psi, which is called the activation function, which

is applied to the inner product of a weight vector Mw with the input x, and then you add

a bias b, which is just a number.

And what we've also seen is that one can train all these parameters w and b, which are in

these neurons here, using stochastic gradient descent of a loss function, which is suitable

for the problem that you would like to solve.

However, what we have not seen so far, and what we do not know yet, is first of all,

can one theoretically give a reason for why neural networks perform so well?

Meaning that can one show that theoretically neural networks can solve any task?

Or in mathematical language, can they approximate any given function?

And a second question you might ask yourself is, do you know whether stochastic gradient

descent converges to something which is then useful in the end?

And in today's lecture, I will try to answer the first question.

And so I will show you proof of a theorem stating that a very simple neural network

can approximate any given function under some conditions.

And this is sort of a theoretical explanation why they perform so great in many applications.

So and these theorems which tell you that neural networks can approximate any given

function that you would like them to approximate are typically called universal approximation

theorems.

And there is not one single universal approximation theorem.

Instead, there's a whole bunch of those theorems depending on the specific context, for instance,

the specific form of the network that is investigated.

And so the oldest theorems of that kind, they regard neural networks with arbitrary width,

but with bounded depth.

And typically, you would just have a neural network with one hidden layer as depicted

down here.

So on the left hand side, you have the input layer, then you have one hidden layer where

all the magic happens, and then you have an output layer.

And so already in the 1980s and 1990s, people have proved universal approximation theorems

for these very simple network structures.

However, as you've seen, the applications one is typically more interested in deep neural

networks, which is also very important in the context of deep learning, which has its

name from the fact that it uses neural networks with a lot of layers.

And so indeed, one can also prove universal approximation theorems for neural networks

with arbitrary depth, but bounded width.

So you restrict yourself to a fixed number of neurons per layer, but then you take a

lot of layers.

And in this context, one is also able to show that such networks can approximate any given

function.

And this has been done mostly in the 2010s and 20s.

Teil einer Videoserie :

Presenters

Leon Bungert Leon Bungert

Zugänglich über

Offener Zugang

Dauer

00:44:32 Min

Aufnahmedatum

2021-05-14

Hochgeladen am

2021-05-14 22:36:57

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen