Welcome back to Pattern Recognition. So today we want to look into a particular class of
classification algorithms and these are boosting algorithms that try to unite multiple weak
classifiers into a strong one and today we want to look into other boosts.
So let's have a look at the idea behind other boosts. Boosting generally is trying to build
a more powerful classifier from weak classifiers and you could say it is one of the most powerful
learning techniques introduced in the last 20 years. It can be applied to any kind of
classification system and you can get a bit of additional performance in any kind of situation
where you really want to tweak out the last couple of percent in terms of performance.
So the idea is to combine the output of many weak classifiers into a powerful
committee. So it is essentially using the wisdom of the crowd and the most popular
boosting algorithm is called addaboost and was introduced in 1997. So a weak classifier
is one whose error is only slightly better than random guessing but you need to have a classifier
that is better than random guessing. Boosting won't work with classifiers that are worse than
random guessing. So the idea of boosting now is that you sequentially apply the weak classifier
to repeatedly modified versions of the data and this produces then a sequence of classifiers and
the weighted majority vote yields the final prediction. So we consider now a two class
problem with the class being either minus one or one and given some observations D that have a
sample and a class we can then train classifiers G of X and this then implies that we also have
an error on the training data set and this can be computed as the average error as one over N
and then the sum over all the misclassifications where I is the indicator function which essentially
checks whether this sample is true then it returns one otherwise zero. Now we can sequentially apply
the weak classifiers to produce a sequence of GM of these different weak classifiers and then
you can combine the weak classifiers in a sum and the final classification of 4G of X is then given
as the sign over the weighted sum of the individual classifiers. So we need some weighting factors alpha,
alpha one to alpha M and they are computed by the boosting algorithm then each alpha weighs the
output of the corresponding classifier. So you could summarize visually the boosting algorithm
that you start with training some classifier then you compute the error on the training set,
you reweight the samples, train a new classifier, compute the error on the training set, reweight
the samples and so on and so on until you then stop at some point where you have trained M
classifiers. So each boosting step consists of applying the weights to the training samples
that essentially amplify the weight of misclassified samples and initially you can start with the weights
just uniformly distributed. So the first classifier in the sequence is trained just the usual way and
then for M greater or equal to 2 the weights are then individually modified and then the classifiers
GM with M greater or equal to 2 are trained on differently weighted samples. Now the weighting
scheme at step M looks at the misclassified observations that have been produced by the
previous classifier and there we have to increase the weights and the weights for correctly
classified samples have decreased weights. So this then means that observations that are difficult
to classify get ever increasing influence over the iterations. Each successive classifier is
forced to concentrate more on those observations that were misclassified by the previous one.
This then brings us to the adderboost iteration scheme. So you initialize your weights with a
uniform distribution then you set the iteration counter to 1, you fit the first classifier onto
the training data set and you use the current weights, then you compute the classification
error that is essentially a weighted sum of the misclassifications, then you update the
classifier weights and here we use the classification error in order to guide us which classifier to
trust more and which classifier to trust less and then we can compute new sample weights again using
our classification loss of our new joint classifier and this way we can then go ahead and iterate over
and over again until we have built the desired number of classifiers in our example M. Then we
finally output the completely trained classifier. So you see that in every iteration of adderboost
we have to train a new classifier. So this version of adderboost is called the discrete version
because each classifier returns a discrete class label. Adderboost can also be modified to return
Presenters
Zugänglich über
Offener Zugang
Dauer
00:09:07 Min
Aufnahmedatum
2020-11-17
Hochgeladen am
2020-11-17 19:19:52
Sprache
en-US
In this video, we introduce the Adaboost Algorithm that fuses many weak classifiers to a strong one.
This video is released under CC BY 4.0. Please feel free to share and reuse.
For reminders to watch the new video follow on Twitter or LinkedIn. Also, join our network for information about talks, videos, and job offers in our Facebook and LinkedIn Groups.
Music Reference: Damiano Baldoni - Thinking of You