So, welcome everybody for today's lecture.
Last week we've seen how a basic computational unit in an artificial neural network known
as a perceptron is computing a mathematical function and how these simple perceptrons
can be grouped together into layers to build up an artificial neural network.
We did not yet so far answer the question on how we can choose the free parameter theta
such that such an artificial neural network does any meaningful job.
So today's lecture will be about how we can train an artificial neural network and for
that we use a concept that is widely known and celebrated as backpropagation.
Before we start, let's try to remind ourselves what we did so far.
So down below you see a sketch of an artificial neural network with an input and an output
layer and in between in the hidden layers there are the simple perceptron grouped for
the computations.
And so far we said that each artificial neural network realizes a parameterized map f theta
appear that maps from our input data to the output data living in a space y.
But as we already seen in our last video lecture, everything depends only on the vector theta
that contains all the free parameters and these have to be chosen or found in any meaningful
way.
So for a hidden layer these free parameters that we will have to estimate during our training
are the matrix Wk containing all the weights in this layer and also the bias vector Bk.
And we have to do this for all our layers within the artificial neural network such
that we can approximate our training data.
So before we get started we need some notation first.
So let's get our hands dirty before we dig in into the machinery of training a neural
network.
So first of all what we would like to have is a layer index L. So this capital L will
always denote one hidden layer and we will denote the preceding hidden layer as L minus
one in the following.
And it gets clear that if we have a fully connected feed-forward network of depths D
this is important then our layer index also will go until D.
So the next thing we need will be a notation for the nodes and let's stay fixed in this
layer L down here.
And we use for this layer L the index I to denote all our neurons within this layer.
So I is always for layer L but if we are looking into the preceding layer L minus one there
we'll use the notation of a running index denoted by J. So J is now running in L minus
one.
So why do we need these?
Well we somehow have now to define arbitrary weights and biases for any neuron in the artificial
neural network and let's get started by an artificial weight which we denote by Wij to
the power of L. Of course this is not to the power of L but it is uppercase L meaning that
it is with respect to layer L down here.
And just to give you some intuition what that means normally you would read this as being
the weight of a connection from I to J but here I chose somehow to do it the other way
around and you will see in a minute why that makes sense.
So read this always as this is the weight coming in into node I from node J down here.
So do it the reverse order and we will see in a minute why this makes sense here.
The next thing is now we introduce a new variable that you've not yet seen so far which we denote
by that and this Z has a lower subscript of J denoting that this is connected to our node
J down here and L minus one meaning that we have the J's node in layer L minus one.
So what does that mean? That is the output of our neuron after performing all the computations
in this neuron. So that is the output of the neuron going into this direction to node I
Presenters
Zugänglich über
Offener Zugang
Dauer
00:49:03 Min
Aufnahmedatum
2020-06-15
Hochgeladen am
2020-06-15 20:26:33
Sprache
en-US