Welcome back to deep learning. So in today's lecture we want to talk about activations
and convolutional neural networks. We've split this up into several parts and the first
one will be about classical activation functions. Later we will talk about convolutional neural
networks pooling and the like. So let's start with activation functions and you can see
that the activation functions go back to a biological motivation. We remember that everything
we've been doing so far, we somehow also motivated with the biological realization.
We see that the biological neurons are being connected with synapses to other neurons.
This way they can actually communicate with each other. The synapses have this myelin
sheath and with this they can actually electrically be insulated. They are able to communicate
with other cells. When they are communicating they are not just sending everything that
they get in. They have a selective mechanism. So if you have a stimulus it actually does
not suffice to generate an output signal. The total signal must be above a threshold
and what then happens is that an action potential is triggered. After that it repolarizes and
then returns to the resting state. Interestingly it does not matter how strongly the cell is
activated. It is always returning the same action potential and returns to its resting
state. The actual biological activation is even more
complicated. You have different accents and they are connected to the synapses in other
neurons. On the paths they are covered with Schwann cells that then they can deliver this
action potential towards the next synapse. There are iron channels and they are actually
used to stabilize the entire electrical process and bring the whole thing again into equilibrium
after the activation pulse. So what we can see is the knowledge essentially
lies in the connections between the neurons. We have both inhibitory and excitatory connections.
The synapses anatomically enforce feed forward processing so it is very similar to what we
have seen so far. However those connections can be in any direction. So they can also
form cycles and you have entire networks of neurons that are connected with different
accents in order to form different cognitive functions. Crucial is the sum of activations.
Only if the sum of activations is above the threshold then you will actually end up with
an activation. These activations are electric spikes with a specified intensity and to be
honest the whole system is also time dependent. Hence they also encode the entire information
over time. So it's not just that we have a single event that passes through but the
whole process runs at a certain frequency. This enables the entire processing over time.
Now activations in artificial neural networks so far they were nonlinear activation functions
and mainly motivated by the universal function approximation. So if we don't have the nonlinearities
we can't get a powerful network. Without the nonlinearities we would just end up with
matrix multiplication. So compared to biology we have some sine function that can model
all or nothing responses. Generally our activations have no time component. Maybe this could be
modeled by the activation strength of the sine function. Of course it is also mathematically
undesirable because the derivative of the sine function is zero everywhere except at
zero where we have infinity. So this is absolutely not suited for back propagation. Hence we've
been using the sigmoid function because we can compute an analytic derivative. Now the
question is can we do better? So let's look at some activation functions. The most simple
one that we can think of is a linear activation where we just reproduce the input. We may
want to scale it with some parameter alpha and then the output. If we do so we get a
derivative of alpha. It's very simple and it would render the entire optimization process
into a convex problem. If we don't introduce any nonlinearity we are essentially stuck
with matrix multiplications. As such we only list it here for completeness. It would not
allow you to build deep neural networks as we know them. Now the sigmoid function is
the first one that we started with. It essentially has a saturation towards one and zero. So
it has a probabilistic output that is very nice. However it saturates for x going towards
very large or very low values. You can see here that the derivative already around three
Presenters
Zugänglich über
Offener Zugang
Dauer
00:09:44 Min
Aufnahmedatum
2021-05-06
Hochgeladen am
2021-05-06 16:08:33
Sprache
en-US
Deep Learning - Activations, Convolutions, and Pooling Part 1
This video presents the biological background of activation functions and the classical choices that were used for neural networks.
For reminders to watch the new video follow on Twitter or LinkedIn.
Further Reading:
A gentle Introduction to Deep Learning