Welcome back to pattern recognition. So today we want to look a bit into model
assessment and in particular we want to talk about the no free lunch theorem and the bias
variance trade-off.
So let's see how we can actually assess different models. In the past lectures we have seen
many different learning algorithms and classification techniques and we've seen that they have different
properties like low computational complexity, we can incorporate prior knowledge, we have
algorithms that are linear and non-linear ones and we've seen the optimality with respect
to certain cost functions. So some of these methods try to compute smooth decision boundaries,
some of them rather non-smooth decision boundaries. But what we really have to ask is are there
any reasons to favor one algorithm over another? And this brings us to the no free lunch theorem
and the no free lunch theorem tells us given a cost function f living in the space of cost
functions and algorithm a and cost cm for a specific sample and we iterate this m times
then the performance of an algorithm is the conditional probability of the cost given
the cost function, the iteration and the algorithm. Now the no free lunch theorem states that
for any two algorithms a1 and a2 there is equivalence in terms of the sum over these
probabilities over all the possible cost functions. This has a couple of consequences for the
classification methods. If no prior assumptions about the problem are made there is no overall
superior or inferior classification method. So if you don't know what the application
is and if you don't know how the cost is actually generated there is no way of picking
the best algorithm. So generally we should be skeptical regarding studies that demonstrate
the overall superiority of a particular method. If you have a method that shows that for this
particular problem this method is suited better then this is probably something that you can
believe. If there is a paper that says for all of the problems this algorithm is better
than the other algorithm it would be a contradiction to the no free lunch theorem. So we have to
focus on the aspects that matter most for the classification problem. There is the prior
information that is very relevant to improve your classification. The data distribution
so you really have to know how your data is distributed, how your data behaves. Then of
course the amount of training data is very relevant for the performance of the classifier
and of course the cost function, the purpose, how you are actually designing the classifier.
If you consider the off training set error we are able to essentially measure the performance
outside the data that we have seen during training. So this is very relevant to measure
the performance towards an unseen data sample. So here we need to compute the error on samples
that have not been contained in the training set and for large training sets the off training
error should be small. So you use it to compare a general classification performance of the
patterns for a particular problem. Now if you consider a two class problem with training
data D consisting of patterns XI and labels YI then YI is generated by an unknown target
function F of XI and thus returns YI. The expected off training set classification error
for the kth learning algorithm can then be found as the expected value k of the error
given F and N and we compute this as the sum of all the observations of X and this is the
probability of X times one minus delta F of X H of X where delta is essentially the hypothesis
on the data and then pk of H of X given D. So E here essentially is the error that is
caused by the hypothesis. So there is different ways of separating learning systems and you
can essentially separate them into possible and impossible learning systems. So possible
ones are you have one positive example and many negative ones, you have many positive
and many negative ones or you could even have positive and negative examples and also outliers
rejection classes that are indicated with zeros here. Then there are also impossible
learning systems where you only have positive samples then you can't make any assumptions
about the negative class, you can only do things like the distance to the positive samples,
you can also have rejection but it won't tell you anything about the negative class and
essentially if you're missing any observation any information about the other class then
Presenters
Zugänglich über
Offener Zugang
Dauer
00:13:46 Min
Aufnahmedatum
2020-11-16
Hochgeladen am
2020-11-17 00:08:55
Sprache
en-US
In this video, we introduce the "no free lunch" theorem and the bias-variance trade-off.
This video is released under CC BY 4.0. Please feel free to share and reuse.
For reminders to watch the new video follow on Twitter or LinkedIn. Also, join our network for information about talks, videos, and job offers in our Facebook and LinkedIn Groups.
Music Reference: Damiano Baldoni - Thinking of You