This audio is presented by the University of Erlangen-Nürnberg.
So just a very brief recap.
So we said, okay, we can sort of imagine what a decision tree is.
Decision tree is a tree structure where at each internal node we formulate a question to our data set,
like is the top part of the image blue? Maybe it is sky.
Is the bottom part of our image green? You could continue to ask that at the next node.
And then if both answers are positive, like yes, you could make a guess, like oh, this is an outdoor image,
because the blue in the top part could be sky, and the green in the bottom part could be grass, or something like that.
Now the idea is that a single decision tree may have difficulties with these yes-no questions
to actually cover all eventualities that may happen in our data set.
So instead of growing one huge tree that asks all possible questions,
we rather take a smaller tree maybe and replicate it a couple of times and ask different questions in each tree.
And lots of trees are a forest, so these are random forests.
They are random because we randomize the way in which we generate these questions.
So that was the whole idea.
And we wrote down a couple of things along the line.
The last thing we did was to introduce the information gain as a criterion,
what might be a good question at a certain point.
So we could say a good question is a question that maximizes our information gain
with this subset of data that we have at a certain node,
because if we start at the root node with the whole data set,
then part of it is determined to be true for the first question, part of it is determined to be false.
So at the next internal node, we only see a part of the data set,
the subsection for which question one was true.
And for this subset of the data set at this node, we say,
oh, okay, what could be the next best question to maximize our information gain?
And information gain intuitively is a metric that separates our data nicely.
So an example for a good separation would be if we start with data that is roughly equally distributed,
let's say we have a three class problem.
And if we are able, for instance, to come up with a question that for our training data,
separates this into two histograms where, let's say, one class is completely contained in one histogram
and maybe a little bit noise left over.
And the other two classes are held in the other histogram,
then we would say, oh, this looks like a good question,
because if we actually have to deal with this class one,
then we can be relatively sure that we can catch it with this question.
So the information gain at this point would be good, and we would know this is a good question.
So we have two more points or three more points to discuss with this abstract tree model
before we actually go into applications with applications, I mean, classification,
or then afterwards regression.
So one thing we didn't really talk about is the leaf prediction model yet.
So how do we come up with a class decision once we reach the end of the tree?
So the end of the tree is the leaf, and then, and we have already seen, okay,
we always have per node a distribution of test samples,
and we know to which class they should belong, because it's our training data.
So the leaf prediction model, we could just say, in the most general case,
but this should be good enough for us,
we decide for a class C star that maximizes, so this is the class that maximizes
the probability of C, V representing C.
So at the end of the day, all we do is look at this histogram and say,
Presenters
Zugänglich über
Offener Zugang
Dauer
01:29:52 Min
Aufnahmedatum
2015-05-11
Hochgeladen am
2015-05-11 10:52:02
Sprache
en-US
This lecture first supplement the methods of preprocessing presented in Pattern Recognition 1 by some operations useful for image processing. In addition several approaches to image segmentation are shown, like edge detection, recognition of regions and textures and motion computation in image sequences. In the area of speech processing approaches to segmentation of speech signals are discussed as well as vector quantization and the theory of Hidden Markov Models.
Accordingly several methods for object recognition are shown. Above that different control strategies usable for pattern analysis systems are presented and therefore also several control algorithms e.g. the A(star) - algorithm.
Finally some formalisms for knowledge representation in pattern analysis systems and knowledge-based pattern analysis are introduced.