43 - Recap Clip 8.5: Using Information Theory (Part 1) [ID:30446]
47 von 47 angezeigt

And the idea is that we can actually predict how good an attribute is going to be by looking

at the information gain in the answer.

The answer, the general answer to the question is will I wait or won't I wait?

And in our example that gives us, that's essentially a one bit information question because we

have in our example set six examples where we'll wait and six examples where we won't

wait.

So the prior is the same as say an unloaded coin.

So you actually have one bit of information.

It's an unloaded from the example question.

And so we have in the beginning we have one bit of information to gain in the decision

tree.

And so in the tree, while developing the tree, we're hoping to get to a node where I only

have one value forced on me.

In this case, we have no more information left.

It's just basically the same as a coin that always falls on heads.

And answering the question, will it fall on head has no information because we already

know what it will do.

And if you think about the leaves in the tree, that's exactly the situation.

We know what to do there.

So somewhere between oh, we have one bit information to gain and we have zero bits of information

to gain, we have to have some steps of information gain.

And we would like to have high information gain.

So the idea here, let me do a different here, is that we define the information gain between

the actual information of my node, of the distribution in the node, and the one I'm

expecting if the distribution were random.

We're always comparing what we expect at this term and what do we have in our situation.

And so the problem we still have here is what is the information and the information we'll

just basically define via the entropy here, which is a standard formula, which is essentially

you take the prior probabilities, put a log on them, weight them by the probability values

and add them all up.

So that gives us a measure, and that measure allows us to say the gain of patrons is much

higher than the gain of, say, type.

So what do we do?

We make a root of the tree patrons, because that already answers the question more than

half.

And then we do whatever is necessary recursively next.

So that's the idea.

And we see that we get a nice, small, little tree for that.

The tree could be much worse.

We're expecting something like an up to depth 10 tree, because we have 10 attributes.

Here we have depth 4.

Remember trees are exponential, so it makes a difference between depth 4 and depth 10.

So that's really the value of information here in this algorithm.

If you now look at how these things work, essentially, if we're looking at the error

rate or, in this case, the fraction of predictions that our algorithm, this decision tree learning

algorithm makes, given some test sets of varying size, we see here that we are actually reaching

somewhere near 100%.

Teil eines Kapitels:
Recaps

Zugänglich über

Offener Zugang

Dauer

00:05:10 Min

Aufnahmedatum

2021-03-30

Hochgeladen am

2021-03-31 11:06:48

Sprache

en-US

Recap: Using Information Theory (Part 1)

Main video on the topic in chapter 8 clip 5.

Einbetten
Wordpress FAU Plugin
iFrame
Teilen