2 - Pattern Recognition [PR] - PR 2 [ID:21818]
50 von 103 angezeigt

Welcome back to pattern recognition. So today we want to continue our short introduction into the topic and we will talk about the postulates for pattern recognition and also some measures of evaluating the performance of a classification system.

So looking forward to showing you some exciting introduction into the topic.

For classical pattern recognition we have six postulates and these six postulates essentially tell us in which occasions it's a good idea to actually apply a pattern recognition system.

The first thing that you should keep in mind is that you should have available a set of representative patterns and they are about our problem domain omega and the patterns f of x are essentially the instances, the observations that we want to use in order to decide for the class of the respective pattern.

Next, a simple pattern should be constructed from features that characterize its membership and the membership is then assigned to a certain class omega kappa.

So this is crucial. You have to have observations with characteristic features, otherwise you will not be able to assign them to a class.

This brings us already to the third postulate. There should be a kind of compact domain in this feature space that labels or brings together the same class.

And typically those areas should have a small intra class distance, meaning that the distance between observations of the same class should be small and observations between samples of different classes. So the inter class distance should be high.

So you can see here a couple of examples for the distribution in the feature space and you see on the top left, this is a pretty clear case. Also the center top image is still a separable case.

They may be not trivial, but in the top right case you can still separate the two classes, of course, no longer with a line, but with something that is nonlinear.

Then, of course, also the classes could be intermixed. So here you see different distributions of classes and you see that we are still able to separate the two when we figure out where the instances of a certain observation are located.

Also, there could be more complex structures in this high dimensional feature space as shown in the bottom center.

The thing that you want to avoid is the configuration on the bottom right, because here it's very hard to separate the two because it looks like the samples were essentially drawn from the same distribution.

If you have observations like this, then you may want to think about whether your feature space is a very good one and maybe you would want to extract different features or even it might be the case that the problem is not solvable.

So this can, of course, also be the case that there might be classes that you can't separate with the observations that have been given.

A fourth postulate is that typically a complex pattern should consist of simpler parts and this then should be also expressible in relations to each other.

So a pattern should be decomposable into those parts.

Then also a complex pattern should have a certain structure, not any arrangement of simple parts should construct a valid pattern and this then means that many patterns can be represented with relatively few constituents.

So here you see an example on the left hand side. This is a valid pattern. On the right hand side, you see parts of the image are not valid, as you can see with the musical note on the transistor on the right.

So this is not a valid pattern and therefore we might want to consider also the case to reject patterns.

And the last postulate, two patterns are similar if their features or simpler constituents differ only slightly.

Okay, if we find that these postulates are actually applicable to our problem, then you can apply the methods of pattern recognition.

So once you applied the pattern recognition, then you probably also want to measure whether your classification was successful and therefore we typically use measures of performance.

And one way of displaying how well your system is performing is looking at a so-called confusion matrix.

So in this confusion matrix, we have essentially one column for every class and one row for every class.

And you see here that we have on the columns the hypothesis and on the rows the references.

So you essentially end up with a square matrix where capital K here is the number of classes and this then allows us to determine how many confusions have been taking place.

And it also will tell you which classes are more likely to be confused with which other class.

So this is a very nice means of displaying the classification result and it will work for several classes.

Obviously, this might not be the right choice if you have thousands of classes, but for let's say 10 to 20 classes, this is still a very good mean of actually displaying the classification result.

If you want to break it down into fewer numbers, one thing that you can do is you can compute the accuracy or recognition rate.

And here you simply take the sum over all correct classifications and you divide by the total number of observations.

And this gives you the recognition rate.

The recognition rate may be biased towards a certain class and therefore also other measures have been introduced.

So here we have the recall and precision where you essentially see that the recall measures the number of correct classifications for that specific class over the total number of observations for this class.

And the precision then is the associated measure.

Again, you look into the correct classifications for the class, but then you essentially take the transpose row for normalization.

So you normalize with all detected observations of that class.

Then there's also measures like the average recall.

And here you have essentially the recall or class wise recognition rate, you could argue, and you essentially then compute the sum over all of the class wise recognition rates and compute the average over that.

And this means that you essentially weigh every class as equally important in contrast to the recognition rate where you weigh everything with the relative frequency as the classes occur.

So this is a measure that will help you to figure out whether you have some bias towards a certain class and it will be ideally high if you have similar recognition rate for all the classes.

Then if you only have two classes, you can also do more elaborate measures of classification measurement.

So here you essentially end up with the true positive rate, the false positive rate, the positive predictive value, which is essentially the precision, the negative predictive value and the true negative rate, which is the specificity.

So this can be expressed as one minus the false positive rate.

This then brings us to the accuracy and the accuracy is given as the true positives plus the true negatives divided over the total number of observations.

And you can also construct the F measure, which is the harmonic mean of recall and precision.

And you can formulate it as two times recall times precision over recall plus precision.

One very sophisticated way of looking at different classifiers is the so-called receiver operating characteristic curve, the ROC curve.

And this is able to essentially look at the classification with different thresholds.

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

00:16:21 Min

Aufnahmedatum

2020-10-25

Hochgeladen am

2020-10-26 00:46:56

Sprache

en-US

In this video, we present the postulates of pattern recognition and measures of evaluation for classification systems.

This video is released under CC BY 4.0. Please feel free to share and reuse.

For reminders to watch the new video follow on Twitter or LinkedIn. Also, join our network for information about talks, videos, and job offers in our Facebook and LinkedIn Groups.

Music Reference: Damiano Baldoni - Thinking of You

Einbetten
Wordpress FAU Plugin
iFrame
Teilen