5 - Knowledge Discovery in Databases [ID:53093]
50 von 783 angezeigt

Okay, welcome to the next lecture in KDD.

Most of you haven't been here on Friday, so I guess our entry into the OLAP chapter isn't

known to everybody in here.

I don't think you need that to understand what we are talking about today, but we have

to do a short introduction for you to be at our point.

Last week we finished up the pre-processing chapter and we talked about OLAP in some detail.

We stopped at this slide, which is basically the continuation of the different kinds of

of Shemata we have within an OLAP database.

An OLAP database is an online analytical processing database, which is tailored to analytical

processing, so basically what we are doing in here.

For a database like that, we often use a relational database.

And this relational database basically has a core fact table and some dimensions based

on that fact table.

Basically you have to think about that in a way that we have three dimensions, but

we have a three dimensional cube.

So with the three dimensions, let's use time, let's use location, and let's use branch.

Then we can select specific sub-cubes within that bigger cube by limiting specific rates

within the time.

For example, if we say we only want measurements out of 2023, then we can cut a slice out of

that bigger cube.

And within that bigger cube, we have little measurements which are the basic entries

within that thing.

Basically a measurement within this specific star schema might be something like time,

8 o'clock at the 1st of April, 2023.

So branch is x, location is R lang, and then we have three measurements.

One is unit sold, let's say $13, $1000, and of course I just don't have any space for

that average sold as well.

And we have multiple measurements that fall into that specific slice in here.

And what we can do within our data warehouse is to calculate an average for that specific

slice.

So everything that is within that specific slice can be, for example, averaged or in

general processed with an aggregation function.

So an aggregate function.

Average is just one aggregation function we can talk about.

We can have something like min, we can have something like max, we can have something

like count.

Those are all aggregation functions.

And for all little data entries within a specific slice, or if we specify more than one dimension,

if we specify two dimensions, for example, we limit the branch as well, then we end up

not with a slice, but with a smaller dice if we also limit the branch.

Of course, I should draw it like that.

And we can do the same thing, limiting the data points and aggregating the specific facts

or measurements in all data points within that cube or that slice.

Within OLAP, we have two possible ways to create that within, well, not two possible,

but two main ways to create that within a relational database schema.

One is, of course, what I've already said.

We have a central fact table, and for every dimension, we have one specific table.

And we just link those tables via foreign key to our fact table.

So we do not really have these specific things within the same table, but we have foreign

keys to those dimensions within the fact table.

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:30:14 Min

Aufnahmedatum

2024-05-27

Hochgeladen am

2024-05-29 10:56:06

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen