39 - Pattern Recognition [PR] - PR 35 [ID:24035]
50 von 109 angezeigt

Welcome back to pattern recognition. So today we want to look a bit into model

assessment and in particular we want to talk about the no free lunch theorem and the bias

variance trade-off.

So let's see how we can actually assess different models. In the past lectures we have seen

many different learning algorithms and classification techniques and we've seen that they have different

properties like low computational complexity, we can incorporate prior knowledge, we have

algorithms that are linear and non-linear ones and we've seen the optimality with respect

to certain cost functions. So some of these methods try to compute smooth decision boundaries,

some of them rather non-smooth decision boundaries. But what we really have to ask is are there

any reasons to favor one algorithm over another? And this brings us to the no free lunch theorem

and the no free lunch theorem tells us given a cost function f living in the space of cost

functions and algorithm a and cost cm for a specific sample and we iterate this m times

then the performance of an algorithm is the conditional probability of the cost given

the cost function, the iteration and the algorithm. Now the no free lunch theorem states that

for any two algorithms a1 and a2 there is equivalence in terms of the sum over these

probabilities over all the possible cost functions. This has a couple of consequences for the

classification methods. If no prior assumptions about the problem are made there is no overall

superior or inferior classification method. So if you don't know what the application

is and if you don't know how the cost is actually generated there is no way of picking

the best algorithm. So generally we should be skeptical regarding studies that demonstrate

the overall superiority of a particular method. If you have a method that shows that for this

particular problem this method is suited better then this is probably something that you can

believe. If there is a paper that says for all of the problems this algorithm is better

than the other algorithm it would be a contradiction to the no free lunch theorem. So we have to

focus on the aspects that matter most for the classification problem. There is the prior

information that is very relevant to improve your classification. The data distribution

so you really have to know how your data is distributed, how your data behaves. Then of

course the amount of training data is very relevant for the performance of the classifier

and of course the cost function, the purpose, how you are actually designing the classifier.

If you consider the off training set error we are able to essentially measure the performance

outside the data that we have seen during training. So this is very relevant to measure

the performance towards an unseen data sample. So here we need to compute the error on samples

that have not been contained in the training set and for large training sets the off training

error should be small. So you use it to compare a general classification performance of the

patterns for a particular problem. Now if you consider a two class problem with training

data D consisting of patterns XI and labels YI then YI is generated by an unknown target

function F of XI and thus returns YI. The expected off training set classification error

for the kth learning algorithm can then be found as the expected value k of the error

given F and N and we compute this as the sum of all the observations of X and this is the

probability of X times one minus delta F of X H of X where delta is essentially the hypothesis

on the data and then pk of H of X given D. So E here essentially is the error that is

caused by the hypothesis. So there is different ways of separating learning systems and you

can essentially separate them into possible and impossible learning systems. So possible

ones are you have one positive example and many negative ones, you have many positive

and many negative ones or you could even have positive and negative examples and also outliers

rejection classes that are indicated with zeros here. Then there are also impossible

learning systems where you only have positive samples then you can't make any assumptions

about the negative class, you can only do things like the distance to the positive samples,

you can also have rejection but it won't tell you anything about the negative class and

essentially if you're missing any observation any information about the other class then

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:13:46 Min

Aufnahmedatum

2020-11-16

Hochgeladen am

2020-11-17 00:08:55

Sprache

en-US

In this video, we introduce the "no free lunch" theorem and the bias-variance trade-off.

This video is released under CC BY 4.0. Please feel free to share and reuse.

For reminders to watch the new video follow on Twitter or LinkedIn. Also, join our network for information about talks, videos, and job offers in our Facebook and LinkedIn Groups.

Music Reference: Damiano Baldoni - Thinking of You

Einbetten
Wordpress FAU Plugin
iFrame
Teilen