The following content has been provided by the University of Erlangen-Nürnberg.
So before we continue with the chapter on optimization and before we introduce the support vector machines,
let's briefly summarize where we are currently in terms of our storyline.
So this winter semester we talk about pattern recognition
and in particular we are focusing on classifiers,
we are focusing on the definition of decision boundaries
and we have introduced basically the concept of the Bayesian classifier,
which makes use of a loss function.
Loss function is associated with each decision we do in pattern recognition.
If we do the right decision, usually we do not generate any loss.
If we do the wrong decision, we generate loss.
If the loss for wrong decisions is constant, we basically have a 0-1 cost function.
And it turns out with respect to this type of loss function,
the best thing you can do to reduce the average loss is to maximize the a posteriori probability.
So we decide for the class with the maximum a posteriori probability,
where X is our D-dimensional feature vector and Y is our categorical class number or our categorical variable.
That's the idea of that.
If Y is a continuous valued number or even a vector with real valued components,
then we talk about regression instead of classification.
So these are the two terms you have to remember.
Classification, assignment of observation to categorical variables,
regression assignment of observations to continuous variables.
Then we have seen that the posteriori probability can be rewritten
in terms of a product of the prior with a class conditional probability.
So that's proportional to that or it's identical to that if you divide by the evidence P of X.
And now we have two choices.
If we want to characterize a classification problem by its posteriors,
we can either represent the posteriors directly or we can represent the posteriors
using this type of decomposition using the priors and the class conditionals.
This is generic versus generative modeling.
Both models are hopefully discriminative.
Generative is this one
and then we have discussed the Gaussian classifier, what is done by the Gaussian classifier
the Gaussian classifier uses the normal distribution for the class conditional
This is then the normal distribution with the mean vector, me, and sigma.
And we studied here some properties of the Gaussian classifier.
It turns out that the decision boundary is a quadratic function.
And if the classes share the same covariance matrix, we have linear decision boundaries.
Then we also looked into the problem, how can I compute the posterior probability
given a certain decision boundary in terms of a zero-level set function, F of X is zero.
And then we found out that we can use the sigmoid function to do so.
And we have introduced here the sigmoid function.
And we talked about logistic regression.
Yes?
Objective function.
No, objective function in German means Zielfunktion.
That's the function you want to optimize.
If you want to solve a problem, you set up a function where the optimum position or the position of the optimum tells you something
about the parameter you are looking for or the point you are looking for.
The sigmoid function.
Presenters
Zugänglich über
Offener Zugang
Dauer
01:26:51 Min
Aufnahmedatum
2012-12-09
Hochgeladen am
2012-12-11 09:09:21
Sprache
en-US