Welcome everybody to pattern recognition. So today we want to look into multilayer perceptrons
that are also called neural networks. So we'll give a brief sketch of the ideas of neural
networks.
Okay so let's have a look at multilayer perceptrons. You see that we talk about this here only
about very basic concepts. If you're interested in neural networks we have an entire class
on deep learning where we talk about all the details. So here we will stay rather on the
surface. You may know that neural networks are extremely popular because they also have
this physiological motivation and we've seen that the perceptron is essentially computing
a sum of elements that go in. They are weighted in an inner product and some bias and you
could say that this has some relation to neurons because neurons are connected with accents
to other neurons and they are essentially getting the electrical activations from those
other neurons. They are collecting them and once the inputs are greater than a certain
threshold then the neuron is activated and you typically have this zero or one response
so it's either activated or not and it doesn't matter how strong the actual activation is.
If you are above the threshold then you have an output. If you're not there's simply no
output. Now you have these neurons and we don't talk about biological ones here but
we will talk about the mathematical ones based on the perceptron and then we can go ahead
and arrange them in layers and layers on top of each other and we essentially have some
input neurons where we simply have the input feature vector and some bias that we're here
indicating with one and this is then passed in a fully connected approach. So we are essentially
connecting everything with everything and we have then hidden layers and they're hidden
because we somehow cannot observe what is really happening with them. We can only observe
that if we have a given input sample and we know the weights then we can actually compute
what is happening there. If we don't have that then generally we don't see what is happening
but we only see the output at the very end and the output then is observable again and
we have typically a desired output and this desired output can then actually be used to
compare this to the output of the network which allows us then to construct a training
procedure. Note that we are not only doing sums of the input elements that are weighted
but what's also very important is this non-linearity. So we kind of need to model this all or none
response function and we've seen that Rosenblatt originally was using the step function. Of
course we could also use linear functions but if we were using linear functions we will
see that towards the end of this video then everything would essentially collapse down
to a single big matrix multiplication. So actually in every layer if they are fully
connected then you're essentially computing a matrix multiplication of the activations
of the previous layer with the next one. So this can be modeled simply as a matrix. Now
what is not modeled typically as a matrix is the activation function and the activation
function is applied element wise. The step function was this approach as Rosenblatt did
it but in later approaches and classical approaches the following two functions the sigmoid function
so the logistic function was very commonly used and as an alternative also the hyperbolic
tangent was also used because it has some advantages with respect to the optimization.
So we can now write down our units of these networks as essentially sums over the previous
layers. So we have some yi that is the output of the previous layer so we indicate this
here with l minus one. This is then multiplied with some wij and we also have this bias w0j
in the current layer l and this sum is essentially constructing the output already but the net
output is then also run through the activation function f and f is one of the choices that
we see above here so this is introducing the non-linearity that is then producing the output
of the layer l in the respective neuron. Now you want to be able to train this and this
is typically done using the back propagation algorithm. This is a supervised learning procedure
and backpropagation helps you to compute the gradients. So backpropagation is actually
not the learning algorithm itself but it's the way of computing the gradient. Here we
Presenters
Zugänglich über
Offener Zugang
Dauer
00:19:20 Min
Aufnahmedatum
2020-11-08
Hochgeladen am
2020-11-08 13:47:16
Sprache
en-US
In this video, we have a short introduction to the multi-layer perceptron.
This video is released under CC BY 4.0. Please feel free to share and reuse.
For reminders to watch the new video follow on Twitter or LinkedIn. Also, join our network for information about talks, videos, and job offers in our Facebook and LinkedIn Groups.
Music Reference: Damiano Baldoni - Thinking of You