Good evening everyone.
I want to start with a short reminder of what we actually learned at the end of last lecture.
We were looking into image recognition and the particular test case is a very famous
and important one to recognize digits, handwritten digits.
And so we said how this is done in principle that you would construct a neural network
that takes as input an image and it outputs several different neurons each of which corresponds
to one particular digit.
So if it recognizes the digit number three then the neuron that belongs to digit number
three will fire.
And so we learned several different things like one-hot encoding and categorical cross-entropy
as a cost function.
But in the end when we implemented it we stumbled upon some interesting behavior.
So these are some images of digits where the network actually misclassifies them.
And if we count for the experiment, the numerical experiment that we have been doing here, how
many are misclassified then was about 77 percent.
And that was in spite of the fact that it seemed like the accuracy during training is
very very good and you seem to have only three percent error on the training samples.
And so that was a little bit of a mystery which is resolved by recognizing that you
have to be very honest in assessing how good you are doing.
And if you only assess the quality on the training examples where you already trained
the network on then this is not giving you a fair assessment of the quality of the network.
So what people do, again we discussed this last time and it's summarized again here,
is they have a training set on which they train the network and then they have a validation
set which the network never sees for the training it's never trained on but you can always use
this to assess the accuracy of the network during training and then you can see how it's
getting better and better.
And then independent of that there are the images to which it will be applied finally
or which form the test set so that's completely independent of everything.
Okay, and so if you adopt this if you have a validation set on which you do not train
then you see this interesting behavior that is shown here.
The accuracy on the training data may increase increase increase over time as you train more
and more on these training samples but the accuracy on the validation data may actually
level off at a much lower value and it might even decrease again it's not so easily visible
here but indicated it by the arrow and that's of course bad so you are training more and
more but the accuracy on the validation data even decreases.
So what's going on is the question.
And so what's going on is known as the phenomenon of overfitting.
So what really happens is basically the network memorizes these training examples.
You've shown these training examples again and again and again to the network and so
at some point it really knows oh this picture where the pixels are arranged in exactly precisely
this order that's the three because I learned it because someone told me many many times
that this picture must be labeled three.
But that doesn't mean that it can generalize to other pictures of three that look a little
bit different because the pixels are in a slightly different shape.
So that's really bad it's like a student who doesn't know what they are doing and just
memorizes something and then if you ask a new question they cannot answer.
So well the solution is first of all to always have this honest assessment of the accuracy
by measuring it against the validation data which the network is not being trained on.
And then you would do this that you stop the training after the validation accuracy has
reached its maximum because after that things are only getting worse after that the network
Presenters
Zugänglich über
Offener Zugang
Dauer
01:27:15 Min
Aufnahmedatum
2019-05-15
Hochgeladen am
2019-05-16 05:19:03
Sprache
en-US
This is a course introducing modern techniques of machine learning, especially deep neural networks, to an audience of physicists. Neural networks can be trained to perform diverse challenging tasks, including image recognition and natural language processing, just by training them on many examples. Neural networks have recently achieved spectacular successes, with their performance often surpassing humans. They are now also being considered more and more for applications in physics, ranging from predictions of material properties to analyzing phase transitions. We will cover the basics of neural networks, convolutional networks, autoencoders, restricted Boltzmann machines, and recurrent neural networks, as well as the recently emerging applications in physics.