2 - Machine Learning for Physicists [ID:52676]
50 von 799 angezeigt

Okay, good. So I think let's get started. So today I will start with a recap as usual,

and then we are going to understand the very important concept of gradient descent and

the concept of backpropagation. So these are the heart of neural network training. So today

and next lecture will be super important to understand the basics of neural network training.

But before we go there, I will just repeat what we said at the end of last time. So we have a

very simple neural network that would be a collection of neurons that are connected by

variable strength connections, and one of the simplest examples you can have is shown here.

So you have say two input neurons, values y1 and y2, and there's one output neuron

that you want to calculate, and the calculation consists of two steps. We said last time there's

a linear step and a non-linear step. So the linear step is a simple weighted superposition

of the input values. So you take y1, multiply it with some weight, let's call it w1. You take y2,

multiply it with some weight, let's call it w2. You add some bias term b, that's just a constant,

and you add them all together, and that is a linear weighted superposition of the input values.

So typically we call it z. z also sometimes has been given the name of pre-activation,

because the values of neurons, they are called activations, like in your brain how active are

the neurons, so that's the activation values. y1 and y2 are examples of activation values,

and z is not in itself an activation value, it's just what you have before you apply the

non-linear function. And the non-linear function f of z, we said, can be almost anything. Typically

it's a monotonically rising function. The only thing we want to have is non-linear, because that

makes neural networks really powerful, this combination of linear and non-linear functions.

Okay, so the action of the linear part is shown here graphically. The blue

plane represents exactly this function z as a function of the two input coordinates, y1 and y2.

And now if you apply this activation function f of z, the typical activation function might be

the sigmoid, which remember is a rounded step function, and it will distinguish between the

negative values of z and positive values, so it will simply in that case cut off the negative

values, turn them to zero, and all the positive values will tend to go to one, and there's a

smooth transition. And so this is the result of applying this very basic network, so in the y1

y2 plane there is a smooth transition between zero and one. Okay, and then we already went

through examples where we said let's have more neurons for the input maybe, or more neurons for

the layers, not only one output but multiple hidden layers, and then we would go through this

whole procedure step by step. So from one layer to the next we do this linear superposition,

applying the non-linear function, and then again linear superposition,

applying the non-linear function, stepping from the input layer to the output layer until we are

finished. Okay, so that's the basic network structure. Are there still questions remaining

from last time? Not at the moment, so let's go on. We also briefly discussed, but I am not going

to recall it here, the general concept of batch processing. So for numerical efficiency you don't

want to calculate just the output of your neural network for one single given input, but you want

to calculate in parallel the outputs for many given inputs. For example, you want to classify

20 images in parallel and that just has advantages. Okay, let's skip this technical part.

And now we are going to ask the question which functions can be approximated via a neural

network? Because if the neural network is supposed to be powerful it should better be able to

approximate basically arbitrary functions, because the function mapping an image to a label that

says cat or dog presumably has to be a very complicated function. In fact, no one even knows

the explicit analytical expression for this function, otherwise you wouldn't need neural

networks, but it just demonstrates that if neural networks are supposed to be able to solve such

complicated tasks then they have to have the ability to approximate basically arbitrary smooth

functions. So that goes under the heading of expressivity and we will now go through it step

by step. We don't start with the most general case of course, we start with the simplest case. The

simplest case would be if the input of the neural network is a single number, so there's a single

input neuron, and the output is also a single number, so there's a single output neuron, and then

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:30:00 Min

Aufnahmedatum

2024-05-02

Hochgeladen am

2024-05-03 10:29:05

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen