13 - Musteranalyse/Pattern Analysis (früher Mustererkennung 2) (PA) [ID:388]
50 von 672 angezeigt

So welcome everybody to the Monday session. I hope you enjoyed the weekend and you are

ready to go with the second part on support vector machines. So we have single value decomposition,

SVD, all of you are using SVN and now we are talking about SVM, support vector machines.

Before I continue with the topic support vector machine, let's briefly reconsider where we

are currently and what the big picture of this lecture is. As all of you know, medical

sorry, pattern analysis, if you teach two lectures in parallel you are always switching back and forth.

Pattern analysis and we all, this is all about basically the modeling of the apostiory probability.

That's basically the key issue and this here can be a vector like we had it in winter semester

but we will see soon that this also can be a sequence of features, the sequences can

vary in its length, in the length this can also be a set of features of varying cardinality.

So there will be a high degree of flexibility later on in this feature, features that we

are basically using but so far we are still fighting with problems where x is just a vector

of fixed dimension, the dimension of our feature vectors is always denoted by d. That's the

current state where we are and we talked about the Bayesian classifier at the beginning and

its role in pattern recognition. We have learned that given a zero-one cost function, the decision

rule that is maximizing basically the apostiory probability turns out to provide the optimal

classifier. We talked about various approaches to model P of y given x and we know this is

up to scaling P of y times P of x given y and this is the so-called prior probability,

this is the posterior probability, this is the class conditional PDF and we have seen

different approaches to model the class conditional PDF for instance we talked about the naive

Bayes classifier. What was the idea of the naive Bayes? Steve?

Not necessarily, naive Bayes is not required to have a linear or linear decision boundary.

Which case does the naive Bayes provide a linear decision boundary?

That's the core assumption of the naive Bayes that means that P of x given y can be factorized

this way P of x by given y but this is not necessarily then a linear decision boundary

that we get here. It depends basically on how we choose these components in the product

and if these are Gaussians for instance you will get curved degree two decision boundaries.

If the classes share the same variances then we get a linear decision boundary. I give

to you a feature vector and a set of feature vectors. It's 100,000 dimension of feature

vector and I tell you as your boss we don't have the money for building a smart classifier

so work out a naive Bayesian classifier based on these feature vectors that I recently bought

somewhere else. How would you build a naive Bayes classifier based on these 100,000 dimensional

features? How would you start to do the classification business? What would you do? That's always

right if you do pattern recognition. This requires here an independency assumption.

That means that in our feature vector the component should be mutually independent.

If I provide a 100,000 dimensional feature vector the probability that you have the independency

assumption fulfilled is zero. So we should work out a strategy how we can get feature

vectors out of these 100 dimensional feature vectors that are basically providing components

that are mutually independent and there is one.

Thank you for the PCA or LEA because it correlates the signal and after wise you can also apply

dimensionality reduction so you've got more flexible features.

Apply PCA. PCA has three advantages. First of all it brings down the high dimensional

feature vectors to a low dimensional feature vector. We know that in high dimensions we

usually have problems. I should also explain the curse of dimensionality by the way in

more detail here. The second advantage of PCA is that the features you get are uncorrelated.

They are independent. They turn out to be independent. You de-correlate things by applying

PCA. And the third thing and that's also something you have seen in the exercises is that the

components you get, the feature vectors with its components, you see that these are yes

what type of distribution do they follow? They are Gaussian because you have a sum or

a linear combination of random variables and then we know from statistics that this is

Zugänglich über

Offener Zugang

Dauer

00:00:00 Min

Aufnahmedatum

2009-06-08

Hochgeladen am

2025-09-30 08:52:01

Sprache

en-US

Tags

Analyse PA Optimization Support Vector Duality Primal Lagrang Slater"s Karush Kuhn Tucker Optimality Conditions problem
Einbetten
Wordpress FAU Plugin
iFrame
Teilen