Wellear
Welcome back everybody to our Deep Learning lecture
and just to put everything into context
so far we've seen that
well, deep learning has made quite
some change in our research so a lot of
things are now possible and can now be achieved
classification results and multi-class problems that people have considered to be extremely hard
can now be solved with rather high accuracy rates.
And we want to look into that a little more and started looking into the background.
And we started with the perceptron, which is essentially just one neuron.
And we've seen that the perceptron is not able to solve, for example,
non-linearly separable problems like the XOR problem.
But then we've seen if we expand this into layers,
then we can start modeling much more complex functions.
And we've seen the so-called universal approximation
theory that already tells us that if we have a neural network with just one
hidden layer, then we are able to model essentially any continuous function.
So this is a very powerful method.
But there's some error bounds.
And this epsilon may be rather high if we don't have enough neurons in there.
And unfortunately, we don't know how many neurons we actually need.
So we've then seen that for some problems, it's rather hard to model this just
with a single layer.
But we've seen already expanding to a second layer
increase the capability of modeling quite a bit.
And we could, for example, map any tree to a two-layer neural network.
So this gave us essentially a motivation for building deeper networks.
And then we've also seen that when we start building those deeper networks,
we also have to consider optimization.
And in particular, we have to train with this back propagation algorithm.
And sometimes this can be numerically unstable.
And there may be problems like vanishing gradients or exploding gradients,
which we will also talk about a little today.
And in particular, we've seen different methods
on how to optimize this gradient descent and how to select the step size.
And there's rather clever algorithms that then automatically
adjust step sizes for different parameters,
depending on how strongly they vary.
And we've seen that we can use momentum to rely
on previous directions of our gradient descent,
such that we can prevent oscillations in our gradient descent procedure.
So now we want to talk a bit more about details of this optimization process.
In particular, we want to talk about activation functions,
because we had only a very coarse view on different activation functions.
So today we want to look into more detail about them.
And we want to talk about convolutional neural networks
and how we can model actually convolution, a very powerful operation
in such a neural network.
And you will see that it's actually not that hard to model
Presenters
Zugänglich über
Offener Zugang
Dauer
01:19:17 Min
Aufnahmedatum
2018-05-02
Hochgeladen am
2018-05-02 17:59:21
Sprache
en-US