2 - Knowledge Discovery in Databases [ID:52866]
50 von 705 angezeigt

Okay.

Welcome to KDD once again.

I'm sorry for last week.

I was too sick to give the lecture.

Sorry for that.

I hope everybody read my mail before coming here and not after coming here.

So everybody was at home at the time and not in this lecture hall.

Regarding the lost time, we decided to not reduce the content of the lecture but add

an additional date for an extra lecture.

However, this date is not set yet and we will make sure that there is recording in that

lecture so everybody that isn't able to be in person, be here in person for that lecture

will have a recording to look at.

However, probably in three weeks, four weeks time, not this week, not next week, but in

some weeks time.

A positive note is that I changed my laser pointer so now everybody should be able to

see my pointer.

I think that's good as well.

We will restart at a point we stopped in two weeks ago.

So we will start at the slide measuring data similarity and dissimilarity of a data lecture.

And we will today talk about the last slides of the data lecture and then we'll go on with

the pre-processing lecture.

Okay, are there any questions without any regard to today's slides?

Okay, then let's start.

Okay, we already talked about what data is in general.

Now we will have to talk about what data is compared to our data sets.

And why do we need that?

Because we need that similarity, dissimilarity, specific applications, specific methods like

classification, like clustering, like outlier analysis.

We will need specific measurement to classify whether two data sets or two data tuples are

similar to each other or not similar to each other.

And ideally we do not want a binary scale, so it's identical and it's not identical,

but we want to have a continuous measurement between the similarity of two slides.

We will talk about each of these methods in later lectures, so classification will be

part of the lecture seven, clustering will be part of the lecture eight, and outlier will

be part of lecture nine.

We will talk about those things later on in the semester, therefore I will skip the definition

of clusters as well because we will talk about that again in lecture eight.

Now to similarity and dissimilarity.

Of course everybody has an opinion what similarity and dissimilarity is.

For example, are you similar to your seat neighbor?

Hard to measure.

We need a specific definition of similarity to measure it.

Similarity is defined as how alike two data objects are.

In most cases we will choose a value between one and zero and one, including zero and one.

But in some cases you will have other intervals as well.

The higher the value is, the more alike, the more similar a value is, typically.

Dissimilarity on the other side is often a synonym of distance.

How far off are two data objects of each other?

So for example, if we have two points in a coordinate system, we have two dimensions.

One dimension is H and one dimension is celery.

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:32:06 Min

Aufnahmedatum

2024-05-06

Hochgeladen am

2024-05-07 11:46:05

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen