22 - Musteranalyse/Pattern Analysis (PA) [ID:2329]
50 von 716 angezeigt

So welcome to the Monday session. Unfortunately, this is the last week of the semester, so we have to terminate the topics that we are discussing about. I would have loved to do twice as much as usual.

And I think you are now on a very good level, so that we could really do cool stuff. But, well, that's the end of the second part of pattern recognition or pattern analysis.

So you have to be happy with what you have learned so far. And for the rest of the week, what is the program? First of all, we will do again our mind map.

Then I will discuss the core questions or key questions of hidden Markov models a little bit. And I will explain to you especially the forward algorithm that is used if you want to implement these things.

And then I would like to show you also a little bit the theory of Markov random fields that is heavily used as a statistical modeling principle in image processing.

So let's start with the mind map. I think it was page nine.

The mind map. If you are thinking about what did we do in winter and what did we do in summer semester? In summer semester, we focused on a very, well, slim area of pattern recognition.

We looked at the ways to characterize the a posteriori probability. And why is the a posteriori probability that important? Well, we know that this is the core of the base classifier.

And the base classifier uses the decision rule, well, I decide for the class with the highest posterior probability.

And then we have discussed different ways to model the posteriors using not feature sequences or feature sets, but using feature vectors.

Of limited dimension. Or I should even say fixed dimension. So it does not change with different signals that you are using for doing the classification.

Of course, that's a demand that is a very strong constraint or defining a very strong constraint, because if you record speech, then people, you know, have different pronunciation, different emphasis, and they stretch the words differently.

So, I mean, you cannot associate one fixed feature vector with the utterance, but you have to decompose the signal into frames and you get a sequence of features.

You cannot use all the theory for that. Or if you do object classification and you look at an object with varying illumination, if you run a corner detector on the images,

I mean, depending on the position and orientation of the light source, you will find a different amount of vertices, corners in the image.

So the number of corners varies. And if you use the corners for classification, then you cannot work with a single feature vector, but you will get a set of features.

Or if you use the edges or something like that, they vary with the illumination.

So that's a very, very, very strong constraint, the fixed dimension. But if you look into the theory of pattern recognition and pattern analysis, you will find out that many, many theoretical results are on the basis of classification problems that have a fixed dimension of feature vectors.

And what did we consider in this context? If we keep the dimension of feature vectors fixed?

Well, we have modeled P of Y, X directly using logistic regression.

We have modeled P of Y times P of X given Y, where this is the data independent prior. So this is data independent.

And this is the class conditional density function. And we have seen various approaches dealing.

We have, for instance, said, OK, we use here Gaussian, for instance.

Or we have seen that we model this by a linear decision boundary.

When do we get a linear decision boundary based on?

Deutsche Bahn.

Linear decision boundary, DB stands for decision boundary. When do we get a linear decision boundary in the context of statistical classifiers?

Well, we get a linear decision boundary if all the Gaussians share the same covariances. If we do, and we call that parameter time, if we do parameter time.

So tied covariances.

And take your time while preparing for the oral exam and think about what happens if I tie the Gaussians, what happens if I have only diagonal matrices?

And if I tie parts of the diagonal matrices, how do things play around with the toolkit that we provided in the exercises?

What else did we discuss? We also said, well, we have a high dimensional feature vector and maybe we can break things down to a product of the feature components while assuming that the components are mutually statistically independent.

And this is for sure the case here if we can factorize the PDF this way.

And if we, for instance, have generated the feature vectors by PCA, you remember that PCA transforms your high dimensional feature vectors to lower dimensional ones.

And it ends up with or you will end up with normally distributed feature vectors. Sounds like a little like a miracle.

But we did experiments in the in the exercises where we have projected the high dimensional feature vector, where we have generated the components out of uniformly in an interval of zero one, for instance.

And then we projected it by PCA and we have observed, well, this looks like a Gaussian. Right.

And these components are even mutually independent. So by design of the features, we can we can apply at the end of the day, the naive base classifiers, the naive base.

That's a very kind wording for the idiot's classifier, right. Idiot space.

There are three ways how you can justify whether assumptions about the distributions of features are right or not.

First is you apply statistical testing. For instance, if you assume that a random variable is normally distributed, then you can use the Kolmogorov Smirnov test to do testing and check whether the normal distribution assumption is valid or not by statistical testing.

The second way how you can justify your choice of parametric densities is you can look at the construction of the feature vectors and by the construction routine, you know what type of parametric distribution occurs.

That is true for the PCA, for instance. We know they are normally distributed and mutually independent. We know that.

There is a theory. And the third component or the third way to show that density assumptions are valid is something that is completely unacceptable for mathematicians.

Completely unacceptable. But for us as engineers, we say, well, my choice is justified if the system works properly.

And that's not bad. That's not a bad culture. Usually we start with normal distributions, right. If we try something and we want to build a statistical classifier, usually we start with normal distributions.

And I mean, if your classifier shows on the training set already a classification rate of 99.9 percent, I mean, why should you bother and change things? If it works, it's justified, well justified.

That's not a weak argument. That's quite a strong argument. I mean, think about the case somebody is proving you on a theoretical level that your features follow a certain distribution.

And then you implement it and it does not work because, you know, some circumstances in practical environment lead to problems that are not considered in the model.

And so it doesn't work. And that does not help, basically. Yeah. If you built a signature reader or a signature recognition system and it misclassifies the signatures, the bank manager will tell you that's bullshit, right.

Zugänglich über

Offener Zugang

Dauer

01:27:34 Min

Aufnahmedatum

2009-07-20

Hochgeladen am

2012-07-30 14:05:38

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen