We looked at learning and the maths is essentially looks from a slight distance looks exactly
like it's always looked before.
We're minimizing a loss function which turns out to be usually squared error loss which
allows us to do gradient descent.
For that we need to compute partial derivatives and if we do that we get a weight update function
that is relatively simple.
So you iterate over this weight update in single layer perceptrons until you have good
weights.
Something terribly interesting here but we're getting for certain functions we're getting
extremely good learning behavior.
Okay, so for the majority function we're getting vastly better performance than say decision
tree I'm sorry decision tree learning.
For other functions say the restaurant data we looked at first perceptrons have no chance.
Why not?
Because there's a realizability problem the one layer perceptrons cannot even cannot even
express the Boolean function that's behind the restaurant data.
So we get good performance here with decision tree learning and perceptrons just have no
chance.
So how do we change that?
Well instead of having single layer perceptrons we have multi-layer neural networks where
you have the input layer you have the output layer I've chosen here to have a single output
so this is a Boolean function and we have hidden layers here.
The neurons on this layer and on that layer look the same.
So these hidden layers the hidden units can do things they can actually nest behavior
right the typical thing is in this network there's also a multi-layer one just turn it
by 90 degrees then you have a two output neural network and we can go from we can get non-linearities
by just nesting linear behaviors.
If you nest two time you get something that's quadratic.
And if you think about it graphically then if you combine two of these cliffs you can
get an edge if you do combine two ridges then you get a point and then you can kind of build
stuff on top of that.
So these shared neural networks have the potential to kind of approximate any surface.
The problem is how do we learn with them?
And the idea here is that you actually given an input an actual input output pair you look
at the current weights we do an iterative procedure again you look at the current weight
which allow you to compute forward from inputs to outputs.
During learning we're in the good situation that we know what the value should have been
in this case true or false.
So we can see whether we've gotten it right or not which allows us to compute backwards
like we do in linear regression one level which gives us corrected weights on these
here and the inputs the virtual inputs the corrected virtual inputs which then allow
us to iterate that.
So that's what this back propagation rule does it's just the update rule except there's
one thing that we have to change we can't rely on we can't separate each layer into
single output cases we have to do for output vector and that's the only thing we really
have to do that's the only change in the math we have to kind of make it multi output from
the start and then everything becomes vectors but we get some kind of an update rule that
is essentially the same.
Presenters
Zugänglich über
Offener Zugang
Dauer
00:06:23 Min
Aufnahmedatum
2021-03-30
Hochgeladen am
2021-03-31 11:46:35
Sprache
en-US
Recap: Artificial Neural Networks (Part 3)
Main video on the topic in chapter 8 clip 16.