Welcome back to pattern recognition. So today we want to talk again about support vector machines,
but we want to remember what we learned about the duality and convex optimization and apply it to
our support vector machines here. So let's see what we can learn more about support vector machines.
So the second part of support vector machines and we are back to the hard margin problem. So here
we see that the SVM can be formulated as the norm squared of our alpha vector. So you remember alpha
is essentially the normal vector of our decision boundary and then we have to constrain our y i
that are multiplied essentially with the signed distance to the decision boundary and this minus
one needs to be larger or equal to zero because we want to preserve this margin and in the soft
margin case you remember that we introduced the additional slack variables. So here we had the
xi i and the xi i should help us to move points that were on the wrong side of the decision boundary
back to the correct side and therewith we will then also be able to find solutions in cases where
the classes are not linearly separable and we also need to adjust our constraints to include the xi i's
and we also need to put in an additional constraint with the minus xi i being smaller or equal to zero.
Now let's set this up in the Lagrangian. So the solution of the constraint convex optimization
problem requires this Lagrangian function. So here you have the normal vector to the power of two in
a norm then you have some mu times the sum over the xi i's then we have minus the sum over mu i xi i
and we have minus the sum over our constraints and this is then employing lambda. So one thing
that we can do is this mu xi i we can essentially map this into a meta parameter and therefore we
then can essentially remove this from the optimization because we choose this as a constant C and we see
that we have the Lagrangian multipliers the mu i and the lambda i. Now if we go ahead and then compute
the partial derivatives of our problem here with respect to our parameter alpha you see that this
is going to give you alpha minus the sum of lambda i y i xi and this needs to be zero so we can solve
this and this gives us alpha is the sum over lambda i y i xi and this somehow seems familiar
right we've seen similar update rules already in the case of the perceptron where we had this y i
multiplied with xi as an update step and here we see now if we have the sum over the entire training
data set we can essentially map this to our original parameter alpha. So that's a very
interesting observation we are able to express the variable that we're actually looking for in the
primal problem completely by the training data set and our Lagrangian multipliers.
Let's look at some more partial derivatives. Partial derivative of the Lagrangian with respect to
alpha zero is simply going to get minus the sum over lambda i y i needs to be zero and the next
partial derivative is going to be the one with respect to xi i and this is going to give c minus
mu i minus lambda i and this is required to be zero. So if we now use this in our Lagrangian
function we can see here this is the Lagrangian function for the hard margin case then we can
rearrange this a little bit so you see that i can pull out the alpha from the sum and rearrange
the sums a little bit and here you then see that we essentially get for the second term in the sum
that it contains the lambda i y i xi transpose so this can be replaced with alpha transposed
then the third element here this is the sum over the lambda i y i this needs to be zero and then
we see that we still have the sum over the lambda i's so we can now essentially write this up as
a sum over the lambda i and lambda j y i y j and xi transposed xj plus the sum over the lambda i
so this is also interesting because now in the dual we don't have the original alpha anymore so
alpha was actually the variable that we were optimizing for in the primal and here you see that
this allows us to rearrange the duo and in the duo we do not have the alpha anymore so we don't
even have to work with this infimum and so on we are able to derive the Lagrange duo function
directly from the primal optimization problem so this is a very interesting observation also note
that our problem is strictly convex so we can find a very nice solution here for the duo so this is
then leading us to the following optimization problem the Lagrange duo problem where we maximize
minus one over two and then this double sum over the lambdas the y's and the x i's plus the sum over
the lambdas subject to that all the lambdas are greater to zero and of course we still have to
put in the constraint that the multiplication of the lambda i and y i needs to sum up to one
so this still has to be fulfilled although it fell out of the primal equation this has a couple
Presenters
Zugänglich über
Offener Zugang
Dauer
00:18:00 Min
Aufnahmedatum
2020-11-11
Hochgeladen am
2020-11-11 16:47:30
Sprache
en-US
In this video, we see how strong duality helps us with the support vector machine optimization.
This video is released under CC BY 4.0. Please feel free to share and reuse.
For reminders to watch the new video follow on Twitter or LinkedIn. Also, join our network for information about talks, videos, and job offers in our Facebook and LinkedIn Groups.
Music Reference: Damiano Baldoni - Thinking of You