Our big topic now is machine learning, and we're gearing up to neural networks.
And we have been looking at linear regression and then at neural networks, which basically
use linear regression in each unit.
And we'll see what that does.
So linear regression as essentially a learning method where we're learning classifications
from classifiers from examples.
We look at the regression problem first, so not classification, but regression, which
is essentially fit a linear model to a set of examples.
And we, as always, do the simple case first, which is univariate linear regression, meaning
a linear function of one real variable and one real value.
Very simple.
And that really means we have points in the x, y plane, and we somehow have to find a
linear model that somehow describes that.
The question is, how do we do that?
And the answer is, just like always, we minimize the error in the hypothesis space of all linear
functions.
It's kind of always our approach right now.
And in this particular hypothesis space, there's a couple of things we used.
One is that linear models are determined by two values, the offset and the slope.
So we always talk about vectors w, which have offset and slope components.
And that gives us a linear function like this one.
And with that, we can then do things like minimize squared error loss, which is a nice
and analytic thing, which means we can look at partial derivatives.
And in this case, we can even minimize the loss and solve the equation here.
And what we've learned before and which we've used yesterday was the fact that this basic
idea, somehow I think I should get rid of this here.
When we have this basic idea of minimizing the squared error loss, we can kind of attach
bells and whistles to this idea.
One of them was regularization.
Another might be to have some other kind of loss function and so on.
We could have the idea, well, but certain functions we want to avoid because we've looked
at them.
They're bad, so let's not go there or something like this.
So the idea is that, even though in this case, we can have a closed form solution, we should
kind of have a backup plan that still works when we've added bells and whistles.
And that is essentially gradient descent, which works extremely well here because in
this case, we have a convex function which has unique global minima.
Okay, so in these cases, gradient descent works well and gradient descent always works
the same way, is we basically loop over something.
We have a current point, we have an update equation, and we just iterate the update equation.
And if we're lucky, we have convergence relatively fast.
We often also introduce a learning rate parameter alpha here, which is typically smaller than
one, smaller or equal to one, which kind of steers our learning.
And if it goes to zero, of course, you can already see that nothing changes.
So if you want things to converge, or at least stop, you just make alpha small.
So one of the things is, we've seen that for simulated annealing in the last semester,
is if you let alpha decay, then you can force convergence.
Of course, the question is whether you're converging to the right thing there.
If you're in a local minimum, you might not be able to escape.
But at least you're in a local minimum.
Presenters
Zugänglich über
Offener Zugang
Dauer
00:06:31 Min
Aufnahmedatum
2021-03-30
Hochgeladen am
2021-03-31 11:16:48
Sprache
en-US
Recap: Regression and Classification with Linear Models (Part 1)
Main video on the topic in chapter 8 clip 11.