22 - Artificial Intelligence II [ID:9366]
50 von 302 angezeigt

OK, good. Welcome back to AI2. To warn you, I have to leave at 1 today, so we'll have

a short lecture here. Yesterday, we completed this section of the course where we talked

about learning from examples via essentially linear regression, which is essentially, again,

quite fitting in a hypothesis space given by a very simple model. And we had talked

about straight out linear regression and classification. We talked about how to use those in bio-inspired

computation in neural networks, which is just basically having networks of linear classifiers,

which again you did weight fitting for. And the last thing we looked at was support vector

machines. And essentially, we're doing linear regression or classification again. Only this

time, we add two more tricks, essentially, which makes this extremely useful. And once

it's well implemented, you can basically use support vector machines out of the box. And

there are lots of packages around that give you this.

OK, so what are the basic ideas? Well, the basic idea is don't just... Ah, that's interesting,

because without the beamer, it worked. Good, so I have to go back to the other. How do

I do this? OK, here I am. Yes. So the idea is that...

That we want to have, instead of just any linear classifier, we want to have a linear

classifier that keeps maximum distance to all the examples. And the hope here is that

this generalizes better than a randomly chosen one. We do a very linear regression. We don't

know which one of these we'll actually get. So what we do instead, instead of doing straight

out error minimization over the space of hypothesis, we do error and distance to error minimization.

In a way, instead of having a thin classifier, we're optimizing also the thickness of our

classifier that already works. OK, that's the first idea. Get better generalization

properties by keeping our distance. And the way this will work out is that miraculously,

the methods will basically only take the support vectors, these examples that are closest into

account for classification, for the weight figure. That's the one idea. And you can do

this by just the old minimization trick, by adding a new kind of Brits parameter, which

you also subject to minimization. But you can do also something else, which is what's

actually done, as you can use quadratic programming methods, which are more efficient in practice.

And also allow, and that's the important part of this, the so-called kernel trick, which

is not as easily done in the minimization gradient descent method. And the idea here

is that if you have non-linearly separable sets, you can sometimes transform them into

higher dimensions to make them linearly separable. And the example that everybody uses, because

it's between two and three dimensions and not between three or five and 2,000 dimensions,

which is actually what happens in practice, is this one where you have a circle-shaped

separator, which is not linear, of course, but you can actually transform it into this

cone-shaped distribution, where, as you know, if you have a cone, then a circle is just

one way of cutting the cone if the support vector of the cutting plane is collinear with

the axis of the cone. And then you get a linear separator here.

Now in principle, that's something you can always do. But the necessary transformations,

in this example, this one here, kind of play badly with your calculation. And the advantage

of these quadratic programming approach is that actually you see that the data, the x

part here, only enters in the form of a cross product, which is the stuff you actually feed

into the kernel function. And very often, you can actually compute your kernel of the

cross product without ever really computing the kernel function itself. The kernel disappears

into the woodwork. And of course, that makes this approach very attractive and computationally

efficient. And that's what SVM packages use. So they give you a standard set of kernels,

and then you can kind of project up. And that often gives you a separable feature space.

And you can imagine alone with these kind of ideas, with these cone-shaped embeddings,

you get all kinds of circles, ellipses, parabolas, and so on, dividing lines as cone sections

at any point. So if you adjust the weights here, instead of just having linear models,

you get essentially cone section models just by this little trick. And many of many point

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:51:25 Min

Aufnahmedatum

2018-06-28

Hochgeladen am

2018-06-28 21:14:52

Sprache

en-US

Dieser Kurs beschäftigt sich mit den Grundlagen der Künstlichen Intelligenz (KI), insbesondere mit Techniken des Schliessens unter Unsicherheit, des maschinellen Lernens und dem Sprachverstehen. 
Der Kurs baut auf der Vorlesung Künstliche Intelligenz I vom Wintersemester auf und führt diese weiter. 

Lernziele und Kompetenzen
Fach- Lern- bzw. Methodenkompetenz

  • Wissen: Die Studierenden lernen grundlegende Repräsentationsformalismen und Algorithmen der Künstlichen Intelligenz kennen.

  • Anwenden: Die Konzepte werden an Beispielen aus der realen Welt angewandt (bungsaufgaben).

  • Analyse: Die Studierenden lernen über die Modellierung in der Maschine menschliche Intelligenzleistungen besser einzuschätzen.

Einbetten
Wordpress FAU Plugin
iFrame
Teilen