7 - Knowledge Discovery in Databases [ID:53205]
50 von 750 angezeigt

Okay, so welcome back to KDD.

Today we will finish up the frequent pattern lecture and we will get into the first parts

of the classification chapter.

First question before we really start, are there any questions regarding last week's

lecture?

I don't think so.

Okay, then just a quick summary.

We already talked about how to get to frequent item sets.

We also talked about how to get association rules from frequent item sets quickly at the

And now we have to talk about evaluation of our patterns because it's not always said

that any frequent item set or association rule we find is really meaningful or interesting.

Of course we make sure that they satisfy our support threshold, they satisfy our thresholds

in general.

However, they can still be misleading association rules.

For example, if you talk about an association rule, basketball leads to eating cereal.

We know that our support is 40% and we have 66.7% trueness to that.

Then this might, and this is just an example of course, be misleading if the overall percentage

of students eating cereal is 75%.

Of course, then playing basketball actually leads to eating less cereal than not playing

basketball.

So even an association rule satisfying our thresholds might be misleading.

In this case, it's just an example again.

The rule basketball leads to eating no cereal actually would be more accurate because now

we know that 33.3% of persons eat no cereal if they play basketball while 25% eat cereal

if they are not playing basketball.

This might happen if we have negative correlation.

To measure something like that, we can apply interesting measure.

I will help her out for a second.

The best way to get in is up there.

It's called Lyft.

Lyft tries to find dependent or correlated events.

If we have a value of one in the Lyft measurement, then we are independent.

Below one, we are negatively correlated.

Let's take our example back to that.

We have our amount of values.

We have playing basketball and eating cereal at 2000 entries, no cereal and playing basketball

at 1000 values and so on and so on.

I think you should already be aware how to read this table.

Now we can compute our Lyft measurement by applying the percentage of people playing

basketball and eating cereal.

So 2000 divided through the total count 5000 through the independent events eating cereal

and playing basketball.

If both would be independent, of course we would get to one.

However, in this case, for the case playing basketball and eating cereal, we end up with

0.89 and for the case of playing basketball and not eating cereal, we end up above one.

So 1.33, which is, as we discussed earlier, more accurate for our case because we are

above one.

This is just an example for interesting measurements for frequent patterns.

We have a lot more of them.

This is just the first page.

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:34:27 Min

Aufnahmedatum

2024-06-10

Hochgeladen am

2024-06-11 11:36:03

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen