So welcome everybody to the Monday session. I hope you enjoyed the weekend and you are
ready to go with the second part on support vector machines. So we have single value decomposition,
SVD, all of you are using SVN and now we are talking about SVM, support vector machines.
Before I continue with the topic support vector machine, let's briefly reconsider where we
are currently and what the big picture of this lecture is. As all of you know, medical
sorry, pattern analysis, if you teach two lectures in parallel you are always switching back and forth.
Pattern analysis and we all, this is all about basically the modeling of the apostiory probability.
That's basically the key issue and this here can be a vector like we had it in winter semester
but we will see soon that this also can be a sequence of features, the sequences can
vary in its length, in the length this can also be a set of features of varying cardinality.
So there will be a high degree of flexibility later on in this feature, features that we
are basically using but so far we are still fighting with problems where x is just a vector
of fixed dimension, the dimension of our feature vectors is always denoted by d. That's the
current state where we are and we talked about the Bayesian classifier at the beginning and
its role in pattern recognition. We have learned that given a zero-one cost function, the decision
rule that is maximizing basically the apostiory probability turns out to provide the optimal
classifier. We talked about various approaches to model P of y given x and we know this is
up to scaling P of y times P of x given y and this is the so-called prior probability,
this is the posterior probability, this is the class conditional PDF and we have seen
different approaches to model the class conditional PDF for instance we talked about the naive
Bayes classifier. What was the idea of the naive Bayes? Steve?
Not necessarily, naive Bayes is not required to have a linear or linear decision boundary.
Which case does the naive Bayes provide a linear decision boundary?
That's the core assumption of the naive Bayes that means that P of x given y can be factorized
this way P of x by given y but this is not necessarily then a linear decision boundary
that we get here. It depends basically on how we choose these components in the product
and if these are Gaussians for instance you will get curved degree two decision boundaries.
If the classes share the same variances then we get a linear decision boundary. I give
to you a feature vector and a set of feature vectors. It's 100,000 dimension of feature
vector and I tell you as your boss we don't have the money for building a smart classifier
so work out a naive Bayesian classifier based on these feature vectors that I recently bought
somewhere else. How would you build a naive Bayes classifier based on these 100,000 dimensional
features? How would you start to do the classification business? What would you do? That's always
right if you do pattern recognition. This requires here an independency assumption.
That means that in our feature vector the component should be mutually independent.
If I provide a 100,000 dimensional feature vector the probability that you have the independency
assumption fulfilled is zero. So we should work out a strategy how we can get feature
vectors out of these 100 dimensional feature vectors that are basically providing components
that are mutually independent and there is one.
Thank you for the PCA or LEA because it correlates the signal and after wise you can also apply
dimensionality reduction so you've got more flexible features.
Apply PCA. PCA has three advantages. First of all it brings down the high dimensional
feature vectors to a low dimensional feature vector. We know that in high dimensions we
usually have problems. I should also explain the curse of dimensionality by the way in
more detail here. The second advantage of PCA is that the features you get are uncorrelated.
They are independent. They turn out to be independent. You de-correlate things by applying
PCA. And the third thing and that's also something you have seen in the exercises is that the
components you get, the feature vectors with its components, you see that these are yes
what type of distribution do they follow? They are Gaussian because you have a sum or
a linear combination of random variables and then we know from statistics that this is
Presenters
Zugänglich über
Offener Zugang
Dauer
00:00:00 Min
Aufnahmedatum
2009-06-08
Hochgeladen am
2025-09-30 08:52:01
Sprache
en-US