23 - Training Artificial Neural Networks using Backpropagation [ID:17847]

50 von 428 angezeigt

So, welcome everybody for today's lecture.

Last week we've seen how a basic computational unit in an artificial neural network known

as a perceptron is computing a mathematical function and how these simple perceptrons

can be grouped together into layers to build up an artificial neural network.

We did not yet so far answer the question on how we can choose the free parameter theta

such that such an artificial neural network does any meaningful job.

So today's lecture will be about how we can train an artificial neural network and for

that we use a concept that is widely known and celebrated as backpropagation.

Before we start, let's try to remind ourselves what we did so far.

So down below you see a sketch of an artificial neural network with an input and an output

layer and in between in the hidden layers there are the simple perceptron grouped for

the computations.

And so far we said that each artificial neural network realizes a parameterized map f theta

appear that maps from our input data to the output data living in a space y.

But as we already seen in our last video lecture, everything depends only on the vector theta

that contains all the free parameters and these have to be chosen or found in any meaningful

way.

So for a hidden layer these free parameters that we will have to estimate during our training

are the matrix Wk containing all the weights in this layer and also the bias vector Bk.

And we have to do this for all our layers within the artificial neural network such

that we can approximate our training data.

So before we get started we need some notation first.

So let's get our hands dirty before we dig in into the machinery of training a neural

network.

So first of all what we would like to have is a layer index L. So this capital L will

always denote one hidden layer and we will denote the preceding hidden layer as L minus

one in the following.

And it gets clear that if we have a fully connected feed-forward network of depths D

this is important then our layer index also will go until D.

So the next thing we need will be a notation for the nodes and let's stay fixed in this

layer L down here.

And we use for this layer L the index I to denote all our neurons within this layer.

So I is always for layer L but if we are looking into the preceding layer L minus one there

we'll use the notation of a running index denoted by J. So J is now running in L minus

one.

So why do we need these?

Well we somehow have now to define arbitrary weights and biases for any neuron in the artificial

neural network and let's get started by an artificial weight which we denote by Wij to

the power of L. Of course this is not to the power of L but it is uppercase L meaning that

it is with respect to layer L down here.

And just to give you some intuition what that means normally you would read this as being

the weight of a connection from I to J but here I chose somehow to do it the other way

around and you will see in a minute why that makes sense.

So read this always as this is the weight coming in into node I from node J down here.

So do it the reverse order and we will see in a minute why this makes sense here.

The next thing is now we introduce a new variable that you've not yet seen so far which we denote

by that and this Z has a lower subscript of J denoting that this is connected to our node

J down here and L minus one meaning that we have the J's node in layer L minus one.

So what does that mean? That is the output of our neuron after performing all the computations

in this neuron. So that is the output of the neuron going into this direction to node I

Teil einer Videoserie :

Mathematical Data Science 1

Presenters

Prof. Dr. Daniel Tenbrinck

Zugänglich über

Offener Zugang

Dauer

00:49:03 Min

Aufnahmedatum

2020-06-15

Hochgeladen am

2020-06-15 20:26:33

Sprache

en-US

Tags

Per RSS abonnieren