Welcome back to pattern recognition.
So today we want to continue looking into other boosts and in particular we want to
see the relation between other boosts and the exponential loss.
So, boosting fits an additive model in a set of elementary basis functions.
So the result of boosting are essentially created by expansion coefficient beta and
some b which is a basis function given set of parameters gamma.
So these additive expansion methods are very popular in learning techniques.
You can see that very similar things are already applied in single hidden layer neural networks
building on the perceptron, in wavelets, in classification trees and so on.
So the expansion models are typically fit by minimizing a loss function L that is averaged
over the training data.
So you essentially can write this as L over the entire training data and then you can
plug in our definition and you see that we have this additive model that is essentially
given by our function fm.
So the forward stagewise modeling approximates the solution to one.
So the new basis functions are sequentially added, parameters and coefficients of already
added functions are not changed and at each iteration only a subproblem of fitting just
a single basis function is solved.
So we can express now the mth subproblem in the following way.
So we essentially have the loss of the m minus one solution plus beta and the current estimate
and we are minimizing over gamma and beta.
Now addaboost can be shown to be equivalent to forward stagewise additive modeling using
an exponential loss function.
So if our loss function equals to the exponent of minus y times f of x we are essentially
constructing the addaboost loss.
So let's prove this.
For addaboost the basis functions are the classifiers gm and they produce the output
of either minus one or plus one.
Using the exponential loss function we now must solve at every step that we essentially
minimize over beta and g the sum over the exponential loss.
If we now introduce a weight that we introduce here as wi and we set it to exponent of minus
y times fm minus one of xi we can rewrite this as a minimization over this weighted
sum of exponential functions.
So we have some observations with this.
Since the wi is independent of beta and g of x it can be seen as a weight that is applied
to each observation.
However this weight depends on the previous functions so the weights change with each
iteration m.
This then allows us to reformulate this problem a little.
We split it up into the misclassified and the correctly classified samples and then
we are able to rearrange this minimization expression and you see that we can again use
our indicator function here for representing the misclassified samples.
Now for every value of beta greater to zero the solution for this minimization process
is found as the minimization over the sum of the weight times the indicator function.
So if we plug the reformulated gm into the objective function and so for beta m this
yields that beta m is given as one over two times the logarithm of one minus the error
m divided over the error m.
Where error m is the minimized weighted error rate and here you see that it is essentially
the misclassification weight summed up divided by the total sum over the weights.
Now from the update formula of the approximation we can calculate the weights for the next
Presenters
Zugänglich über
Offener Zugang
Dauer
00:10:00 Min
Aufnahmedatum
2020-11-17
Hochgeladen am
2020-11-17 22:48:12
Sprache
en-US
In this video, we show that Adaboost is actually optimizing the exponential loss.
This video is released under CC BY 4.0. Please feel free to share and reuse.
For reminders to watch the new video follow on Twitter or LinkedIn. Also, join our network for information about talks, videos, and job offers in our Facebook and LinkedIn Groups.
Music Reference: Damiano Baldoni - Thinking of You