20 - Pattern Recognition (PR) [ID:2627]
50 von 382 angezeigt

The following content has been provided by the University of Erlangen-Nürnberg.

Last Tuesday there were two questions and I'd like to come back to these questions right

now. The first question was about the Lagrange multiplier method in general. I looked at

the Internet and here's a document from the University of Saarland. Here's the Internet

address. You just Google for University Saarland and this title here and this a nice example

how this method actually works. We have a problem. We have a function f of two parameters

context x and y and the function computes the product of both parameters. So as a geometric

interpretation x and y are the lengths of a rectangle and you compute the area. And

you have a constraint. The constraint is given here by a function g

and it says two times x plus two times y should equal u. So the parameter should be a given

number u. And you want to find the parameters that maximize the area of your rectangle given

the parameter. There's one easy solution. It's easy because you can solve this constraint

here for x and plug it in into your function f and compute the derivative and solve it.

The other possibility which is more general is that you use the Lagrange multiplier

method. So you set up the Lagrange function which is your function f given x and y and

you add the constraint and multiply it with the Lagrange multiplier. And then you have

constraints that the derivative with respect to x, the derivative with respect to y and

the derivative with respect to the Lagrange multiplier has to be zero. That's an important

constraint and if you solve it then you get values for x and y and also so for lambda.

So you see here y has to be minus two times lambda. The same holds for x so you can conclude

that x has to be equal to y. Then you can plug it in this result into your equation and you can

get a value of x which is u divided by four. So this is the first one. So you can see that

this is how this technique in general works. This was an example for an equality constraint

but it works the same for inequality constraints. The second question was if we have the Lagrange

function for our SVM problem and it's a convex function why don't we use the optimization

techniques that were introduced before? So why don't we use descent methods? Actually

it really is a convex function and you can use descent methods. So this is the second

question. It really is a convex function and you can use descent methods. The benefit of

the other approach is that if we go to the dual problem we get rid of the parameters

alpha and alpha zero. So we get rid of a large number of parameters we just have to focus

on the Lagrange multiplier and the slack variables. So it's always better to optimize

a problem with a lower number of parameters and the descent methods in general have a problem.

We talked about the step size. You have to choose the appropriate step size and in general

these methods are quite slow. So if you can solve it in a closed solution that's always

better. However, as I was told there are really implementations that are based on descent

methods.

Here is the dual Lagrange function

and you see this function here

is only in the parameters lambda you get rid of the original parameters alpha and alpha zero.

Of course the other Lagrange

multipliers are missing here and the slack variables are missing because we are looking at the hard margin case

So yesterday you were talking about the kernel trick and what

is the message of the kernel trick? Assume we have a classification problem and we want

to separate two classes. We have examples of class plus one here and let's say we have

examples of class minus one here. Okay? And as you see this is not a linear classification

problem. You can't solve it with a linear decision boundary. So the decision boundary

has to look, should look like this. And now you want to use support vector machine techniques.

So you are looking for a band that should be as wide as possible that separates the

two areas. Okay? And the question is how can you solve this problem? You have support vector

machines. We know that we can write all, we can substitute all inner products by a kernel

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:29:52 Min

Aufnahmedatum

2012-12-18

Hochgeladen am

2012-12-19 10:00:19

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen