29 - Pattern Recognition [PR] - PR 25 [ID:23537]
50 von 135 angezeigt

Welcome back to pattern recognition. So today we want to talk again about support vector machines,

but we want to remember what we learned about the duality and convex optimization and apply it to

our support vector machines here. So let's see what we can learn more about support vector machines.

So the second part of support vector machines and we are back to the hard margin problem. So here

we see that the SVM can be formulated as the norm squared of our alpha vector. So you remember alpha

is essentially the normal vector of our decision boundary and then we have to constrain our y i

that are multiplied essentially with the signed distance to the decision boundary and this minus

one needs to be larger or equal to zero because we want to preserve this margin and in the soft

margin case you remember that we introduced the additional slack variables. So here we had the

xi i and the xi i should help us to move points that were on the wrong side of the decision boundary

back to the correct side and therewith we will then also be able to find solutions in cases where

the classes are not linearly separable and we also need to adjust our constraints to include the xi i's

and we also need to put in an additional constraint with the minus xi i being smaller or equal to zero.

Now let's set this up in the Lagrangian. So the solution of the constraint convex optimization

problem requires this Lagrangian function. So here you have the normal vector to the power of two in

a norm then you have some mu times the sum over the xi i's then we have minus the sum over mu i xi i

and we have minus the sum over our constraints and this is then employing lambda. So one thing

that we can do is this mu xi i we can essentially map this into a meta parameter and therefore we

then can essentially remove this from the optimization because we choose this as a constant C and we see

that we have the Lagrangian multipliers the mu i and the lambda i. Now if we go ahead and then compute

the partial derivatives of our problem here with respect to our parameter alpha you see that this

is going to give you alpha minus the sum of lambda i y i xi and this needs to be zero so we can solve

this and this gives us alpha is the sum over lambda i y i xi and this somehow seems familiar

right we've seen similar update rules already in the case of the perceptron where we had this y i

multiplied with xi as an update step and here we see now if we have the sum over the entire training

data set we can essentially map this to our original parameter alpha. So that's a very

interesting observation we are able to express the variable that we're actually looking for in the

primal problem completely by the training data set and our Lagrangian multipliers.

Let's look at some more partial derivatives. Partial derivative of the Lagrangian with respect to

alpha zero is simply going to get minus the sum over lambda i y i needs to be zero and the next

partial derivative is going to be the one with respect to xi i and this is going to give c minus

mu i minus lambda i and this is required to be zero. So if we now use this in our Lagrangian

function we can see here this is the Lagrangian function for the hard margin case then we can

rearrange this a little bit so you see that i can pull out the alpha from the sum and rearrange

the sums a little bit and here you then see that we essentially get for the second term in the sum

that it contains the lambda i y i xi transpose so this can be replaced with alpha transposed

then the third element here this is the sum over the lambda i y i this needs to be zero and then

we see that we still have the sum over the lambda i's so we can now essentially write this up as

a sum over the lambda i and lambda j y i y j and xi transposed xj plus the sum over the lambda i

so this is also interesting because now in the dual we don't have the original alpha anymore so

alpha was actually the variable that we were optimizing for in the primal and here you see that

this allows us to rearrange the duo and in the duo we do not have the alpha anymore so we don't

even have to work with this infimum and so on we are able to derive the Lagrange duo function

directly from the primal optimization problem so this is a very interesting observation also note

that our problem is strictly convex so we can find a very nice solution here for the duo so this is

then leading us to the following optimization problem the Lagrange duo problem where we maximize

minus one over two and then this double sum over the lambdas the y's and the x i's plus the sum over

the lambdas subject to that all the lambdas are greater to zero and of course we still have to

put in the constraint that the multiplication of the lambda i and y i needs to sum up to one

so this still has to be fulfilled although it fell out of the primal equation this has a couple

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:18:00 Min

Aufnahmedatum

2020-11-11

Hochgeladen am

2020-11-11 16:47:30

Sprache

en-US

In this video, we see how strong duality helps us with the support vector machine optimization.

This video is released under CC BY 4.0. Please feel free to share and reuse.

For reminders to watch the new video follow on Twitter or LinkedIn. Also, join our network for information about talks, videos, and job offers in our Facebook and LinkedIn Groups.

Music Reference: Damiano Baldoni - Thinking of You

Einbetten
Wordpress FAU Plugin
iFrame
Teilen