And hopefully we are ready.
I'll kind of check that.
Now we are able to start with some delay.
Sorry for that, but now we have a recording, a rudimentary recording, but we can still
record the lecture.
Okay, today we will talk about the rest of the preprocessing lecture.
We stopped, I think on that slide, correct me if I'm wrong.
We talked about PCA and we talked about wavelet transform last lecture, if I'm correct.
Can you give me a quick thumbs up or thumbs down?
Okay.
And we will continue with that.
We will get through that as fast as possible because time.
And the next thing is we will talk about OLAP today, Online Analytical Processing, which
is a really quick lecture.
Originally it was designed to be part of the guest lectures slot, so only 45 minutes or
so for that.
So it should work perfectly today.
But yeah, let's start.
Are there any questions regarding last week's lecture?
Okay.
Then we can start with the, oh, one thing I should check is the voice recording.
Sorry again for that.
But without voice, you might get...
Test test?
Yeah, okay.
Okay.
We can get to our data reduction topic.
We already talked about how we can reduce correlation between attributes by using, for
example, wavelet transform or the principal component analysis.
As I already said, PCA is the more important, the more often used method in our field.
So if we are working with relational data sets, we are more often using PCAs than wavelet
transform.
And wavelet transform is more often used with images or something like that, but can also
be used with numbers.
As we have seen, we talked about the discrete wavelet transform and yeah.
Something we didn't discuss yet is how to handle string data for example.
Because string data, of course, can be handled by PCA, can be handled by wavelet transform.
So we have to get around that in some way as well.
One pretty obvious way to reduce dimensionality on string data is to look for duplicate attributes.
For example, if we have an attribute name and an attribute surname, or an attribute
full name and an attribute first name.
For example, when there's some degree of redundancy in these attributes and we can
just select a subset of these attributes to go on.
This is a highly manual process.
We can, of course, apply some methods to get a subset of attributes that is likely to have
redundancy in them, but we still have to pick which attributes we have and which attribute
we can dismiss.
Also, pretty obvious, we can just drop out irrelevant attributes.
Why should we have irrelevant attributes within our data set if we already know that it's
not relevant?
Presenters
Zugänglich über
Offener Zugang
Dauer
01:24:57 Min
Aufnahmedatum
2024-05-24
Hochgeladen am
2024-05-27 13:15:28
Sprache
en-US