The following content has been provided by the University of Erlangen-Nürnberg.
Okay, hello everyone. Good evening, welcome back to the second lecture. So the plan for today is
the following. I will first remind you of some of the things that we said last time about the
basics of neural networks. Then I will tell you about something that is a bit technical,
but it's a very useful trick how to process many samples in parallel. And then I will teach you
something about the universality of neural networks, which means that give me any nonlinear
function and I can produce in principle a neural network that does this function for you. And after
that we'll start with discussing the training. So how do you teach a neural network?
So just to remind you of what we said last time. So this is a neural network. The purpose is to
produce some output given some input and the input might be for example a picture and the
output could be a description of this picture. And you want to teach the network by showing it
very many training examples without ever having to hand code an algorithm yourself. Still you hope
that the neural network in the end will have learned the mapping from input to output. So
then we discussed how simple neural network looks like. And let me just switch to the description
right away. So this is a very simple neural network where you have some neurons that encode the input
values. So each blue dot represents a neuron or a unit and it will have a value. These values you
prescribe, these are the input values. These values the neural network computes for you. It's a very
simple network because it does not have any hidden layers rather it goes straight from input to
output. And in general any neural network works by a combination of linear mappings and very simple
nonlinear functions. So the linear mapping I've written down here. If you label the output neurons
in the upper layer by J and the input neurons by K then what happens is you can take some arbitrary
weighted superposition of the input neurons, possibly add some offset or bias term we call it,
and you get the value of an output neuron. And these weights in the superposition they depend
both on the input and output neurons so these weights are so to speak the strengths of the
connections that we're showing here. And both for artificial neural networks as well as for real
biological neural networks a training means that these weights will change in time. Now one big
part of all of this will be that linear algebra can make things simpler at least in terms of
notation. And so this is obviously a matrix applied to a vector so I can also write W matrix times Y
in that is a vector and then add another vector B that contains these bias values. So it's rather
simple if you write it like that and after this linear superposition you then apply a nonlinear
function. So for each output neuron you still plug the value that you got, ZJ, into some nonlinear
function. And this nonlinear function typically is very simple it could be a smooth step function
or it could be something that is zero for negative input values and rises linearly for positive
values. So there are many possible nonlinear functions the important thing is only you have
a nonlinear function. Okay so then I explained to you that we are doing this with Python. At the end
of this lecture of course we will discuss whether you have all succeeded in installing stuff or
whether you still have some questions. And then we said how do you program this little piece of
linear algebra. You would take a matrix W, you take a matrix vector product in Python which you
can write down like this dot W comma Y and then the output of this operation will of course have
dimensions N out. Just like for any matrix vector product this is the dimension of the index you
sum over and that's the remaining dimension. And then we went on to implement this. So we said for
example take this network we have three input neurons two output neurons we want to create a
matrix of weights W which has the dimension in this case two by three because input output
dimension versus input dimension and you also have this bias vector whose size is just given
by the number of output neurons each of them needs its own bias. And in this particular example I
implemented them to be random I took some arbitrary vector for the input values and then I simply
applied both the linear algebra operation and the subsequent nonlinear function. And in this case we
took a smooth step function also called sigmoid a physicist would say a Fermi distribution. Is
there still some question about this from last time? Okay so here's a graphical visualization
which I mentioned briefly last time. So suppose we have a network that has two input values and
Presenters
Zugänglich über
Offener Zugang
Dauer
01:23:50 Min
Aufnahmedatum
2017-05-11
Hochgeladen am
2017-05-14 09:02:01
Sprache
en-US