Okay, good. So I think let's get started. So today I will start with a recap as usual,
and then we are going to understand the very important concept of gradient descent and
the concept of backpropagation. So these are the heart of neural network training. So today
and next lecture will be super important to understand the basics of neural network training.
But before we go there, I will just repeat what we said at the end of last time. So we have a
very simple neural network that would be a collection of neurons that are connected by
variable strength connections, and one of the simplest examples you can have is shown here.
So you have say two input neurons, values y1 and y2, and there's one output neuron
that you want to calculate, and the calculation consists of two steps. We said last time there's
a linear step and a non-linear step. So the linear step is a simple weighted superposition
of the input values. So you take y1, multiply it with some weight, let's call it w1. You take y2,
multiply it with some weight, let's call it w2. You add some bias term b, that's just a constant,
and you add them all together, and that is a linear weighted superposition of the input values.
So typically we call it z. z also sometimes has been given the name of pre-activation,
because the values of neurons, they are called activations, like in your brain how active are
the neurons, so that's the activation values. y1 and y2 are examples of activation values,
and z is not in itself an activation value, it's just what you have before you apply the
non-linear function. And the non-linear function f of z, we said, can be almost anything. Typically
it's a monotonically rising function. The only thing we want to have is non-linear, because that
makes neural networks really powerful, this combination of linear and non-linear functions.
Okay, so the action of the linear part is shown here graphically. The blue
plane represents exactly this function z as a function of the two input coordinates, y1 and y2.
And now if you apply this activation function f of z, the typical activation function might be
the sigmoid, which remember is a rounded step function, and it will distinguish between the
negative values of z and positive values, so it will simply in that case cut off the negative
values, turn them to zero, and all the positive values will tend to go to one, and there's a
smooth transition. And so this is the result of applying this very basic network, so in the y1
y2 plane there is a smooth transition between zero and one. Okay, and then we already went
through examples where we said let's have more neurons for the input maybe, or more neurons for
the layers, not only one output but multiple hidden layers, and then we would go through this
whole procedure step by step. So from one layer to the next we do this linear superposition,
applying the non-linear function, and then again linear superposition,
applying the non-linear function, stepping from the input layer to the output layer until we are
finished. Okay, so that's the basic network structure. Are there still questions remaining
from last time? Not at the moment, so let's go on. We also briefly discussed, but I am not going
to recall it here, the general concept of batch processing. So for numerical efficiency you don't
want to calculate just the output of your neural network for one single given input, but you want
to calculate in parallel the outputs for many given inputs. For example, you want to classify
20 images in parallel and that just has advantages. Okay, let's skip this technical part.
And now we are going to ask the question which functions can be approximated via a neural
network? Because if the neural network is supposed to be powerful it should better be able to
approximate basically arbitrary functions, because the function mapping an image to a label that
says cat or dog presumably has to be a very complicated function. In fact, no one even knows
the explicit analytical expression for this function, otherwise you wouldn't need neural
networks, but it just demonstrates that if neural networks are supposed to be able to solve such
complicated tasks then they have to have the ability to approximate basically arbitrary smooth
functions. So that goes under the heading of expressivity and we will now go through it step
by step. We don't start with the most general case of course, we start with the simplest case. The
simplest case would be if the input of the neural network is a single number, so there's a single
input neuron, and the output is also a single number, so there's a single output neuron, and then
Presenters
Zugänglich über
Offener Zugang
Dauer
01:30:00 Min
Aufnahmedatum
2024-05-02
Hochgeladen am
2024-05-03 10:29:05
Sprache
en-US