Okay, so welcome back to KDD.
Today we will finish up the frequent pattern lecture and we will get into the first parts
of the classification chapter.
First question before we really start, are there any questions regarding last week's
lecture?
I don't think so.
Okay, then just a quick summary.
We already talked about how to get to frequent item sets.
We also talked about how to get association rules from frequent item sets quickly at the
And now we have to talk about evaluation of our patterns because it's not always said
that any frequent item set or association rule we find is really meaningful or interesting.
Of course we make sure that they satisfy our support threshold, they satisfy our thresholds
in general.
However, they can still be misleading association rules.
For example, if you talk about an association rule, basketball leads to eating cereal.
We know that our support is 40% and we have 66.7% trueness to that.
Then this might, and this is just an example of course, be misleading if the overall percentage
of students eating cereal is 75%.
Of course, then playing basketball actually leads to eating less cereal than not playing
basketball.
So even an association rule satisfying our thresholds might be misleading.
In this case, it's just an example again.
The rule basketball leads to eating no cereal actually would be more accurate because now
we know that 33.3% of persons eat no cereal if they play basketball while 25% eat cereal
if they are not playing basketball.
This might happen if we have negative correlation.
To measure something like that, we can apply interesting measure.
I will help her out for a second.
The best way to get in is up there.
It's called Lyft.
Lyft tries to find dependent or correlated events.
If we have a value of one in the Lyft measurement, then we are independent.
Below one, we are negatively correlated.
Let's take our example back to that.
We have our amount of values.
We have playing basketball and eating cereal at 2000 entries, no cereal and playing basketball
at 1000 values and so on and so on.
I think you should already be aware how to read this table.
Now we can compute our Lyft measurement by applying the percentage of people playing
basketball and eating cereal.
So 2000 divided through the total count 5000 through the independent events eating cereal
and playing basketball.
If both would be independent, of course we would get to one.
However, in this case, for the case playing basketball and eating cereal, we end up with
0.89 and for the case of playing basketball and not eating cereal, we end up above one.
So 1.33, which is, as we discussed earlier, more accurate for our case because we are
above one.
This is just an example for interesting measurements for frequent patterns.
We have a lot more of them.
This is just the first page.
Presenters
Zugänglich über
Offener Zugang
Dauer
01:34:27 Min
Aufnahmedatum
2024-06-10
Hochgeladen am
2024-06-11 11:36:03
Sprache
en-US