Okay, good evening. Welcome to the second lecture in the machine learning course. So
regarding some organizational details like when to do the exam, we will discuss this
at the end of this lecture. But now I just want to remind you briefly what we did last
time. So we introduced the general structure of a neural network. It's composed of neurons
that are connected to each other. And there's a bunch of neurons at the bottom of the neural
network that are the input neurons. So you feed in your input values and then layer by
layer you calculate the output values. And the way this is done is shown here just to
remind you. So there are two steps in each piece of the calculation. There's a linear
step and a non-linear step. And so in the linear step you take all these neuron values
in the lowest layer. You take a weighted superposition of these values that's shown here and you feed
it into a neuron in the upper layer. So that would be the value that is here called z.
But then this is not enough. This wouldn't give you any powerful neural network. And
so in addition you just apply a non-linear function that's shown here. So you apply a
non-linear function called f to each of these values that you calculated previously by the
linear operation. And that's it. And then you just proceed step by step, layer by layer
doing the same stuff. Linear superposition, non-linear function, linear superposition,
non-linear function. So then I introduced to you how we would do this in Python. I pointed
out that the way you are operating in the linear step is actually essentially a matrix
vector multiplication. And so in a programming language like Python that has nice linear
algebra capabilities, you can literally apply a matrix to a vector in a single step. And
that would give you the output. And then there's just this additional step of applying the
non-linear function. So we looked at how this would be done in Python. I also visualized
what happens here. So if you have only two input values, two input neurons and one output
neuron, then you can plot these two input neurons in the two-dimensional plane and in
the vertical axis you would have the output value. And here I only plotted the linear
part, which obviously must be a plane. If I have a function of two variables that is
linear, that gives me a plane and such a representation. But then I will apply the non-linear function
and the non-linear function can be any non-linear function. But one famous example is the sigmoid,
which basically cuts off all the values that are smaller than zero and it also suppresses
the values that are larger than zero down to the level of one and it has a smooth transition
between them. And so that is what is shown here. So that would be one step, still relatively
boring, but then you can proceed layer by layer and you get the interesting behavior
that we already observed last time. So there's one extra thing that I have to tell you how
to make things efficient. And that is in the end we will be interested in not only calculating
the output of the network for a single input, but really for hundreds of inputs in parallel.
And this can be done also using this linear algebra matrix vector notation very efficiently.
And so this is what I want to talk about here. And so when we try to get the output of the
network for 100 samples in parallel, we call these 100 samples together a batch. It's a
batch of samples. It's just a set of samples that we want to apply the network to. So I've
drawn the situation here. You would have, say, a network with three input neurons. So
each sample consists of three values. And you want to feed many samples in parallel into
the network without just doing a loop. Of course, you could always loop over all these
samples and produce the corresponding output and then store the output, but you want to
do it without a loop because at least in an interpreted language like Python, a loop would
be terribly inefficient. And so the way to do this is just to expand our arrays. So now
all the arrays, I will go through the details, but all the arrays will acquire an extra index.
This extra index counts the sample. So if we have 100 samples, then this extra index
has values running from 0 to 99. And so this is shown here in some more detail. Usually
we would have one sample that would be a vector of size n in, if n in is the number of input
neurons of my network. So that is what we already discussed. But now if I have many
Presenters
Zugänglich über
Offener Zugang
Dauer
01:24:31 Min
Aufnahmedatum
2019-05-06
Hochgeladen am
2019-05-07 10:59:03
Sprache
en-US
This is a course introducing modern techniques of machine learning, especially deep neural networks, to an audience of physicists. Neural networks can be trained to perform diverse challenging tasks, including image recognition and natural language processing, just by training them on many examples. Neural networks have recently achieved spectacular successes, with their performance often surpassing humans. They are now also being considered more and more for applications in physics, ranging from predictions of material properties to analyzing phase transitions. We will cover the basics of neural networks, convolutional networks, autoencoders, restricted Boltzmann machines, and recurrent neural networks, as well as the recently emerging applications in physics.