So, welcome to this lecture on universal approximation with neural networks.
In the last couple of lectures, we've already learned a lot on neural networks, in particular
where they are applied and how one can train them.
As we have seen, neural networks perform great in different tasks, like for instance classification
or image processing, natural language processing, but also gaming, etc.
And this explains why they are so popular nowadays, because they sort of surpass classical
methods which have been applied to these tasks before.
So, we've already seen that neural networks are composed of artificial neurons, that's
why they are called neural networks.
And such a neuron has a simple mathematical form, let's call it phi of x, where x is the
input and it's given by some function psi, which is called the activation function, which
is applied to the inner product of a weight vector Mw with the input x, and then you add
a bias b, which is just a number.
And what we've also seen is that one can train all these parameters w and b, which are in
these neurons here, using stochastic gradient descent of a loss function, which is suitable
for the problem that you would like to solve.
However, what we have not seen so far, and what we do not know yet, is first of all,
can one theoretically give a reason for why neural networks perform so well?
Meaning that can one show that theoretically neural networks can solve any task?
Or in mathematical language, can they approximate any given function?
And a second question you might ask yourself is, do you know whether stochastic gradient
descent converges to something which is then useful in the end?
And in today's lecture, I will try to answer the first question.
And so I will show you proof of a theorem stating that a very simple neural network
can approximate any given function under some conditions.
And this is sort of a theoretical explanation why they perform so great in many applications.
So and these theorems which tell you that neural networks can approximate any given
function that you would like them to approximate are typically called universal approximation
theorems.
And there is not one single universal approximation theorem.
Instead, there's a whole bunch of those theorems depending on the specific context, for instance,
the specific form of the network that is investigated.
And so the oldest theorems of that kind, they regard neural networks with arbitrary width,
but with bounded depth.
And typically, you would just have a neural network with one hidden layer as depicted
down here.
So on the left hand side, you have the input layer, then you have one hidden layer where
all the magic happens, and then you have an output layer.
And so already in the 1980s and 1990s, people have proved universal approximation theorems
for these very simple network structures.
However, as you've seen, the applications one is typically more interested in deep neural
networks, which is also very important in the context of deep learning, which has its
name from the fact that it uses neural networks with a lot of layers.
And so indeed, one can also prove universal approximation theorems for neural networks
with arbitrary depth, but bounded width.
So you restrict yourself to a fixed number of neurons per layer, but then you take a
lot of layers.
And in this context, one is also able to show that such networks can approximate any given
function.
And this has been done mostly in the 2010s and 20s.
Presenters
Leon Bungert
Zugänglich über
Offener Zugang
Dauer
00:44:32 Min
Aufnahmedatum
2021-05-14
Hochgeladen am
2021-05-14 22:36:57
Sprache
en-US