Welcome everybody for today's mini online lecture.
So last time we did not really focus on the details on how we could really choose proper
training data for training our artificial neural networks and determining the optimal
parameters.
So today we will add some more light to this issue.
So first of all, let us remind where we stopped in the last video lecture.
We defined a loss function that was helping us to measure the performance of our artificial
neural network.
And one example that we used in the example was the mean squared error function.
So the loss function helped us to train our neural network and we investigated the issue
of non-convexity and we also discussed the reasons why these loss functions in general
are non-convex.
Just as a reminder, there was this argument of arbitrary permutation in all the hidden
layers.
So this was the main argument for non-convexity.
And we said that unfortunately this loss function has many local minima.
So today I would like to argue with you that that not has to be a disadvantage.
But this might be an advantage and I hope in the end you realize that convergence to
a local minimum can help us really to generalize artificial neural networks also to unknown
data.
So training a network can be performed using an iterative algorithm instead of a direct
solver as we discussed last time.
And the example that we discussed was the gradient decency scheme.
And as we will see today, this might not be the optimal choice for artificial neural networks.
So the questions that we would like to tackle today are first, how can we choose the right
data for training our artificial neural networks?
And is there anything I need to keep in mind when choosing this data?
And second is how can we optimize the free parameters in case of a huge training data
set, which is actually normal nowadays for the most modern applications.
For example, ImageNet consists of millions of images.
And as we pointed out last time, optimizing a neural network using backpropagation would
need to have a backpropagation step for each of these training pairs just to update once
the free parameters.
So we will see different schemes today that help us in these situations.
So first of all, how can you choose training data?
We didn't discuss this, so let's shed some light on this.
First of all, an ideal set of training data should cover all the interesting characteristics
that are present in the data while not trying to be biased towards the different classes.
If you think about classification tasks, then you would rather not like to have a classifier
that's always being drawn to a certain class, but who independently judges and is based
on a statistical base.
So it should cover all classes, that's for sure.
I mean, if we miss any class, then we cannot expect our artificial neural network to know
about this class.
And the amount of data per class should somehow be related to the probability of occurrence.
Just to give you some impression on how this might look like, if we see we have different
classes being dots in some high dimensional feature space, for example.
So these red dots are class one, and we choose some dots that are pretty close in a second
class.
Then we would like to train a neural network that is capable of separating these two classes.
Presenters
Zugänglich über
Offener Zugang
Dauer
00:23:42 Min
Aufnahmedatum
2020-06-19
Hochgeladen am
2020-06-20 01:26:30
Sprache
en-US