Okay, welcome to the next lecture in KDD.
Most of you haven't been here on Friday, so I guess our entry into the OLAP chapter isn't
known to everybody in here.
I don't think you need that to understand what we are talking about today, but we have
to do a short introduction for you to be at our point.
Last week we finished up the pre-processing chapter and we talked about OLAP in some detail.
We stopped at this slide, which is basically the continuation of the different kinds of
of Shemata we have within an OLAP database.
An OLAP database is an online analytical processing database, which is tailored to analytical
processing, so basically what we are doing in here.
For a database like that, we often use a relational database.
And this relational database basically has a core fact table and some dimensions based
on that fact table.
Basically you have to think about that in a way that we have three dimensions, but
we have a three dimensional cube.
So with the three dimensions, let's use time, let's use location, and let's use branch.
Then we can select specific sub-cubes within that bigger cube by limiting specific rates
within the time.
For example, if we say we only want measurements out of 2023, then we can cut a slice out of
that bigger cube.
And within that bigger cube, we have little measurements which are the basic entries
within that thing.
Basically a measurement within this specific star schema might be something like time,
8 o'clock at the 1st of April, 2023.
So branch is x, location is R lang, and then we have three measurements.
One is unit sold, let's say $13, $1000, and of course I just don't have any space for
that average sold as well.
And we have multiple measurements that fall into that specific slice in here.
And what we can do within our data warehouse is to calculate an average for that specific
slice.
So everything that is within that specific slice can be, for example, averaged or in
general processed with an aggregation function.
So an aggregate function.
Average is just one aggregation function we can talk about.
We can have something like min, we can have something like max, we can have something
like count.
Those are all aggregation functions.
And for all little data entries within a specific slice, or if we specify more than one dimension,
if we specify two dimensions, for example, we limit the branch as well, then we end up
not with a slice, but with a smaller dice if we also limit the branch.
Of course, I should draw it like that.
And we can do the same thing, limiting the data points and aggregating the specific facts
or measurements in all data points within that cube or that slice.
Within OLAP, we have two possible ways to create that within, well, not two possible,
but two main ways to create that within a relational database schema.
One is, of course, what I've already said.
We have a central fact table, and for every dimension, we have one specific table.
And we just link those tables via foreign key to our fact table.
So we do not really have these specific things within the same table, but we have foreign
keys to those dimensions within the fact table.
Presenters
Zugänglich über
Offener Zugang
Dauer
01:30:14 Min
Aufnahmedatum
2024-05-27
Hochgeladen am
2024-05-29 10:56:06
Sprache
en-US