56 - Recap Clip 8.18: Support Vector Machines [ID:30459]
50 von 84 angezeigt

Yesterday, we completed this section of the course where we talked about learning from

examples via essentially linear regression, which is essentially again weight fitting

in a hypothesis space given by a very simple model.

We had talked about straight out linear regression and classification. We talked about how to

use those in bio-inspired computation in neural networks, which is just basically having networks

of linear classifiers, which again you did weight fitting for.

The last thing we looked at was support vector machines. Essentially, we're doing linear

regression or classification again. Only this time, we add two more tricks essentially,

which makes this extremely useful and once it's well implemented, you can basically use

support vector machines out of the box. There are lots of packages around that give you

this.

What are the basic ideas? The basic idea is that we want to have instead of just any linear

classifier, we want to have a linear classifier that keeps maximum distance to all the examples

and the hope here is that this generalizes better than a randomly chosen one. We do every

linear regression. We don't know which one of these we'll actually get. What we do instead,

instead of doing straight out error minimization over the space of hypothesis, we do error

and distance to error minimization. In a way, instead of having a thin classifier, we're

optimizing also the thickness of our classifier that already works. That's the first idea.

Get better generalization properties by keeping our distance and the way this will work out

is that miraculously, the methods will basically only take the support vectors, these examples

that are closest into account for classification, for the weight figure. That's the one idea.

You can do this by just the old minimization trick by adding a new breadth parameter, which

you also subject to minimization. You can do also something else, which is what's actually

done, as you can use quadratic programming methods, which are more efficient in practice.

Also allow, and that's the important part of this, the so-called kernel trick, which

is not as easily done in the minimization gradient descent method. The idea here is

that if you have non-linearly separable sets, you can sometimes transform them into higher

dimensions to make them linearly separable. The example that everybody uses, because it's

between two and three dimensions and not between five and 2,000 dimensions, which is actually

what happens in practice, is this one where you have a circle-shaped separator, which

is not linear, of course, but you can actually transform it into this cone-shaped distribution,

where as you know, if you have a cone, then a circle is just one way of cutting the cone,

if the support vector of the cutting plane is collinear with the axis of the cone, and

then you get a linear separator here.

Now, in principle, that's something you can always do, but the necessary transformations,

in this example, this one here, kind of play badly with your calculation. The advantage

of these quadratic programming approach is that actually you see that the data, the x

part here, only enters in the form of a cross product, which is the stuff you actually feed

into the kernel function, and very often you can actually compute your kernel of the cross

product without ever really computing the kernel function itself. The kernel disappears

into the woodwork, and of course that makes this approach very attractive and computationally

efficient.

And that's what SVM packages use. So they give you a standard set of kernels, and then

you can kind of project up, and that often gives you a separable feature space. And you

can imagine alone with these kind of ideas, with these cone-shaped embeddings, you get

all kinds of circles, ellipses, parabolas, and so on, dividing lines as cone sections

at any point. So if you adjust the weights here, instead of just having linear models,

you get essentially cone section models just by this little trick. And many of many point

sets become separable or almost separable if you allow yourself all kinds of cone sections.

And if you go up in the dimensionality, you get basically polynomial-shaped sections,

Teil eines Kapitels:
Recaps

Zugänglich über

Offener Zugang

Dauer

00:12:16 Min

Aufnahmedatum

2021-03-30

Hochgeladen am

2021-03-31 11:57:37

Sprache

en-US

Recap: Support Vector Machines

Main video on the topic in chapter 8 clip 18.

Einbetten
Wordpress FAU Plugin
iFrame
Teilen