4 - Knowledge Discovery in Databases [ID:53064]

50 von 591 angezeigt

And hopefully we are ready.

I'll kind of check that.

Now we are able to start with some delay.

Sorry for that, but now we have a recording, a rudimentary recording, but we can still

record the lecture.

Okay, today we will talk about the rest of the preprocessing lecture.

We stopped, I think on that slide, correct me if I'm wrong.

We talked about PCA and we talked about wavelet transform last lecture, if I'm correct.

Can you give me a quick thumbs up or thumbs down?

Okay.

And we will continue with that.

We will get through that as fast as possible because time.

And the next thing is we will talk about OLAP today, Online Analytical Processing, which

is a really quick lecture.

Originally it was designed to be part of the guest lectures slot, so only 45 minutes or

so for that.

So it should work perfectly today.

But yeah, let's start.

Are there any questions regarding last week's lecture?

Okay.

Then we can start with the, oh, one thing I should check is the voice recording.

Sorry again for that.

But without voice, you might get...

Test test?

Yeah, okay.

Okay.

We can get to our data reduction topic.

We already talked about how we can reduce correlation between attributes by using, for

example, wavelet transform or the principal component analysis.

As I already said, PCA is the more important, the more often used method in our field.

So if we are working with relational data sets, we are more often using PCAs than wavelet

transform.

And wavelet transform is more often used with images or something like that, but can also

be used with numbers.

As we have seen, we talked about the discrete wavelet transform and yeah.

Something we didn't discuss yet is how to handle string data for example.

Because string data, of course, can be handled by PCA, can be handled by wavelet transform.

So we have to get around that in some way as well.

One pretty obvious way to reduce dimensionality on string data is to look for duplicate attributes.

For example, if we have an attribute name and an attribute surname, or an attribute

full name and an attribute first name.

For example, when there's some degree of redundancy in these attributes and we can

just select a subset of these attributes to go on.

This is a highly manual process.

We can, of course, apply some methods to get a subset of attributes that is likely to have

redundancy in them, but we still have to pick which attributes we have and which attribute

we can dismiss.

Also, pretty obvious, we can just drop out irrelevant attributes.

Why should we have irrelevant attributes within our data set if we already know that it's

not relevant?

Teil einer Videoserie :

Knowledge Discovery in Databases

Presenters

M. Sc. Dominik Probst

Zugänglich über

Offener Zugang

Dauer

01:24:57 Min

Aufnahmedatum

2024-05-24

Hochgeladen am

2024-05-27 13:15:28

Sprache

en-US

Einbetten

Wordpress FAU Plugin

 https://www.fau.tv/clip/id/53064

iFrame

<iframe src="https://api.video.uni-erlangen.de/services/oembed/?url=https://www.fau.tv/clip/id/53064&format=iframe&maxwidth=1280&maxheight=720" width="1280" height="720"seamless allowfullscreen style="border: 0; padding: 0; margin: 0;overflow: hidden;"></iframe>

Herunterladen

Video

Audio

Per RSS abonnieren