49 - Recap Clip 8.11: Regression and Classification with Linear Models (Part 1) [ID:30452]
50 von 55 angezeigt

Our big topic now is machine learning, and we're gearing up to neural networks.

And we have been looking at linear regression and then at neural networks, which basically

use linear regression in each unit.

And we'll see what that does.

So linear regression as essentially a learning method where we're learning classifications

from classifiers from examples.

We look at the regression problem first, so not classification, but regression, which

is essentially fit a linear model to a set of examples.

And we, as always, do the simple case first, which is univariate linear regression, meaning

a linear function of one real variable and one real value.

Very simple.

And that really means we have points in the x, y plane, and we somehow have to find a

linear model that somehow describes that.

The question is, how do we do that?

And the answer is, just like always, we minimize the error in the hypothesis space of all linear

functions.

It's kind of always our approach right now.

And in this particular hypothesis space, there's a couple of things we used.

One is that linear models are determined by two values, the offset and the slope.

So we always talk about vectors w, which have offset and slope components.

And that gives us a linear function like this one.

And with that, we can then do things like minimize squared error loss, which is a nice

and analytic thing, which means we can look at partial derivatives.

And in this case, we can even minimize the loss and solve the equation here.

And what we've learned before and which we've used yesterday was the fact that this basic

idea, somehow I think I should get rid of this here.

When we have this basic idea of minimizing the squared error loss, we can kind of attach

bells and whistles to this idea.

One of them was regularization.

Another might be to have some other kind of loss function and so on.

We could have the idea, well, but certain functions we want to avoid because we've looked

at them.

They're bad, so let's not go there or something like this.

So the idea is that, even though in this case, we can have a closed form solution, we should

kind of have a backup plan that still works when we've added bells and whistles.

And that is essentially gradient descent, which works extremely well here because in

this case, we have a convex function which has unique global minima.

Okay, so in these cases, gradient descent works well and gradient descent always works

the same way, is we basically loop over something.

We have a current point, we have an update equation, and we just iterate the update equation.

And if we're lucky, we have convergence relatively fast.

We often also introduce a learning rate parameter alpha here, which is typically smaller than

one, smaller or equal to one, which kind of steers our learning.

And if it goes to zero, of course, you can already see that nothing changes.

So if you want things to converge, or at least stop, you just make alpha small.

So one of the things is, we've seen that for simulated annealing in the last semester,

is if you let alpha decay, then you can force convergence.

Of course, the question is whether you're converging to the right thing there.

If you're in a local minimum, you might not be able to escape.

But at least you're in a local minimum.

Teil eines Kapitels:
Recaps

Zugänglich über

Offener Zugang

Dauer

00:06:31 Min

Aufnahmedatum

2021-03-30

Hochgeladen am

2021-03-31 11:16:48

Sprache

en-US

Recap: Regression and Classification with Linear Models (Part 1)

Main video on the topic in chapter 8 clip 11.

Einbetten
Wordpress FAU Plugin
iFrame
Teilen