16 - Pattern Recognition PR [ID:2585]
50 von 643 angezeigt

The following content has been provided by the University of Erlangen-Nürnberg.

So before we continue with the chapter on optimization and before we introduce the support vector machines,

let's briefly summarize where we are currently in terms of our storyline.

So this winter semester we talk about pattern recognition

and in particular we are focusing on classifiers,

we are focusing on the definition of decision boundaries

and we have introduced basically the concept of the Bayesian classifier,

which makes use of a loss function.

Loss function is associated with each decision we do in pattern recognition.

If we do the right decision, usually we do not generate any loss.

If we do the wrong decision, we generate loss.

If the loss for wrong decisions is constant, we basically have a 0-1 cost function.

And it turns out with respect to this type of loss function,

the best thing you can do to reduce the average loss is to maximize the a posteriori probability.

So we decide for the class with the maximum a posteriori probability,

where X is our D-dimensional feature vector and Y is our categorical class number or our categorical variable.

That's the idea of that.

If Y is a continuous valued number or even a vector with real valued components,

then we talk about regression instead of classification.

So these are the two terms you have to remember.

Classification, assignment of observation to categorical variables,

regression assignment of observations to continuous variables.

Then we have seen that the posteriori probability can be rewritten

in terms of a product of the prior with a class conditional probability.

So that's proportional to that or it's identical to that if you divide by the evidence P of X.

And now we have two choices.

If we want to characterize a classification problem by its posteriors,

we can either represent the posteriors directly or we can represent the posteriors

using this type of decomposition using the priors and the class conditionals.

This is generic versus generative modeling.

Both models are hopefully discriminative.

Generative is this one

and then we have discussed the Gaussian classifier, what is done by the Gaussian classifier

the Gaussian classifier uses the normal distribution for the class conditional

This is then the normal distribution with the mean vector, me, and sigma.

And we studied here some properties of the Gaussian classifier.

It turns out that the decision boundary is a quadratic function.

And if the classes share the same covariance matrix, we have linear decision boundaries.

Then we also looked into the problem, how can I compute the posterior probability

given a certain decision boundary in terms of a zero-level set function, F of X is zero.

And then we found out that we can use the sigmoid function to do so.

And we have introduced here the sigmoid function.

And we talked about logistic regression.

Yes?

Objective function.

No, objective function in German means Zielfunktion.

That's the function you want to optimize.

If you want to solve a problem, you set up a function where the optimum position or the position of the optimum tells you something

about the parameter you are looking for or the point you are looking for.

The sigmoid function.

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:26:51 Min

Aufnahmedatum

2012-12-09

Hochgeladen am

2012-12-11 09:09:21

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen