OK, good. Welcome back to AI2. To warn you, I have to leave at 1 today, so we'll have
a short lecture here. Yesterday, we completed this section of the course where we talked
about learning from examples via essentially linear regression, which is essentially, again,
quite fitting in a hypothesis space given by a very simple model. And we had talked
about straight out linear regression and classification. We talked about how to use those in bio-inspired
computation in neural networks, which is just basically having networks of linear classifiers,
which again you did weight fitting for. And the last thing we looked at was support vector
machines. And essentially, we're doing linear regression or classification again. Only this
time, we add two more tricks, essentially, which makes this extremely useful. And once
it's well implemented, you can basically use support vector machines out of the box. And
there are lots of packages around that give you this.
OK, so what are the basic ideas? Well, the basic idea is don't just... Ah, that's interesting,
because without the beamer, it worked. Good, so I have to go back to the other. How do
I do this? OK, here I am. Yes. So the idea is that...
That we want to have, instead of just any linear classifier, we want to have a linear
classifier that keeps maximum distance to all the examples. And the hope here is that
this generalizes better than a randomly chosen one. We do a very linear regression. We don't
know which one of these we'll actually get. So what we do instead, instead of doing straight
out error minimization over the space of hypothesis, we do error and distance to error minimization.
In a way, instead of having a thin classifier, we're optimizing also the thickness of our
classifier that already works. OK, that's the first idea. Get better generalization
properties by keeping our distance. And the way this will work out is that miraculously,
the methods will basically only take the support vectors, these examples that are closest into
account for classification, for the weight figure. That's the one idea. And you can do
this by just the old minimization trick, by adding a new kind of Brits parameter, which
you also subject to minimization. But you can do also something else, which is what's
actually done, as you can use quadratic programming methods, which are more efficient in practice.
And also allow, and that's the important part of this, the so-called kernel trick, which
is not as easily done in the minimization gradient descent method. And the idea here
is that if you have non-linearly separable sets, you can sometimes transform them into
higher dimensions to make them linearly separable. And the example that everybody uses, because
it's between two and three dimensions and not between three or five and 2,000 dimensions,
which is actually what happens in practice, is this one where you have a circle-shaped
separator, which is not linear, of course, but you can actually transform it into this
cone-shaped distribution, where, as you know, if you have a cone, then a circle is just
one way of cutting the cone if the support vector of the cutting plane is collinear with
the axis of the cone. And then you get a linear separator here.
Now in principle, that's something you can always do. But the necessary transformations,
in this example, this one here, kind of play badly with your calculation. And the advantage
of these quadratic programming approach is that actually you see that the data, the x
part here, only enters in the form of a cross product, which is the stuff you actually feed
into the kernel function. And very often, you can actually compute your kernel of the
cross product without ever really computing the kernel function itself. The kernel disappears
into the woodwork. And of course, that makes this approach very attractive and computationally
efficient. And that's what SVM packages use. So they give you a standard set of kernels,
and then you can kind of project up. And that often gives you a separable feature space.
And you can imagine alone with these kind of ideas, with these cone-shaped embeddings,
you get all kinds of circles, ellipses, parabolas, and so on, dividing lines as cone sections
at any point. So if you adjust the weights here, instead of just having linear models,
you get essentially cone section models just by this little trick. And many of many point
Presenters
Zugänglich über
Offener Zugang
Dauer
00:51:25 Min
Aufnahmedatum
2018-06-28
Hochgeladen am
2018-06-28 21:14:52
Sprache
en-US
Der Kurs baut auf der Vorlesung Künstliche Intelligenz I vom Wintersemester auf und führt diese weiter.
Lernziele und Kompetenzen
Fach- Lern- bzw. Methodenkompetenz
-
Wissen: Die Studierenden lernen grundlegende Repräsentationsformalismen und Algorithmen der Künstlichen Intelligenz kennen.
-
Anwenden: Die Konzepte werden an Beispielen aus der realen Welt angewandt (bungsaufgaben).
-
Analyse: Die Studierenden lernen über die Modellierung in der Maschine menschliche Intelligenzleistungen besser einzuschätzen.