24 - Optimisation in Training Artificial Neural Networks [ID:18208]

50 von 354 angezeigt

Welcome everybody for today's mini online lecture.

So last time we did not really focus on the details on how we could really choose proper

training data for training our artificial neural networks and determining the optimal

parameters.

So today we will add some more light to this issue.

So first of all, let us remind where we stopped in the last video lecture.

We defined a loss function that was helping us to measure the performance of our artificial

neural network.

And one example that we used in the example was the mean squared error function.

So the loss function helped us to train our neural network and we investigated the issue

of non-convexity and we also discussed the reasons why these loss functions in general

are non-convex.

Just as a reminder, there was this argument of arbitrary permutation in all the hidden

layers.

So this was the main argument for non-convexity.

And we said that unfortunately this loss function has many local minima.

So today I would like to argue with you that that not has to be a disadvantage.

But this might be an advantage and I hope in the end you realize that convergence to

a local minimum can help us really to generalize artificial neural networks also to unknown

data.

So training a network can be performed using an iterative algorithm instead of a direct

solver as we discussed last time.

And the example that we discussed was the gradient decency scheme.

And as we will see today, this might not be the optimal choice for artificial neural networks.

So the questions that we would like to tackle today are first, how can we choose the right

data for training our artificial neural networks?

And is there anything I need to keep in mind when choosing this data?

And second is how can we optimize the free parameters in case of a huge training data

set, which is actually normal nowadays for the most modern applications.

For example, ImageNet consists of millions of images.

And as we pointed out last time, optimizing a neural network using backpropagation would

need to have a backpropagation step for each of these training pairs just to update once

the free parameters.

So we will see different schemes today that help us in these situations.

So first of all, how can you choose training data?

We didn't discuss this, so let's shed some light on this.

First of all, an ideal set of training data should cover all the interesting characteristics

that are present in the data while not trying to be biased towards the different classes.

If you think about classification tasks, then you would rather not like to have a classifier

that's always being drawn to a certain class, but who independently judges and is based

on a statistical base.

So it should cover all classes, that's for sure.

I mean, if we miss any class, then we cannot expect our artificial neural network to know

about this class.

And the amount of data per class should somehow be related to the probability of occurrence.

Just to give you some impression on how this might look like, if we see we have different

classes being dots in some high dimensional feature space, for example.

So these red dots are class one, and we choose some dots that are pretty close in a second

class.

Then we would like to train a neural network that is capable of separating these two classes.

Teil einer Videoserie :

Mathematical Data Science 1

Presenters

Prof. Dr. Daniel Tenbrinck

Zugänglich über

Offener Zugang

Dauer

00:23:42 Min

Aufnahmedatum

2020-06-19

Hochgeladen am

2020-06-20 01:26:30

Sprache

en-US

Tags

Per RSS abonnieren