9 - Knowledge Discovery in Databases [ID:53410]
50 von 898 angezeigt

Okay, welcome.

Before we start with our content, before we start continuing the chapter of classification

and finishing it up, I want to talk shortly about the submissions.

The first submission ended yesterday evening.

I guess a lot of you did great.

We had 16 teams with full points, which is great.

However, there have been some technical difficulties.

The points weren't displayed correctly early on in the submission.

And yesterday I set a wrong deadline due to these two things and the submission being

a first iteration submission.

So these submission sheets have been created for the first time this semester.

I decided to lower the points needed to unlock the mock exam from 75% to 60%.

So you will need a lower point overall to unlock the mock exam.

Because I don't want you to have problems due to technical difficulties or things like

that.

I think that's a fair change to that.

So overall you will need 60 points over all submissions.

Every submission has to succeed 60%, but in total you have to reach 60% of points to unlock

the mock exam later in the semester.

I will change this percentage in the student course later today once I put in the points

of this submission as well.

And the second submission on the topic of classification will start today and it will

have 50 points.

So there are more points to earn in the second submission.

So even if you didn't perform well in the first submission, you can still reach unlocking

the mock exam.

Okay, with that said, we can go back to the chapter of classification and finish it up.

If I remember correctly, we stopped quickly explaining the significance of T-tests and

didn't yet come to explaining receiver operating characteristics during the last lecture.

Is that correct?

Because I'm not 100% sure with that.

Is anyone of other opinion?

Okay, perfect.

Then let's start with this very important kind of curve.

Rock curve is very often used to describe the rate of balancing out a model's true positive

rate and a model's false negative rate, true negative rate, so false positive rate.

Most often you can, for example, try to find all true classes, you can try to train your

model to always find correct true classes.

So for example, I will use the same example again, if you have a COVID-19 test, we want

to try to find everybody who's COVID, to identify everybody who's COVID with our classifier,

in this case, a COVID test, to really be sure that nobody that has COVID will get a negative

result.

So in this case, we try to get a high true positive rate.

In some cases, however, we want to focus on getting as low as possible false positive

rate.

So we try to find a way to never say we are of that class if we aren't of that class.

These things are often diametric to each other, so you can either optimize for one or optimize

for another.

And if you draw the true positive rate and false positive rate for your specific classifier

out on a curve, it might look something like this rock curve, where the perfect classifier

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:32:08 Min

Aufnahmedatum

2024-06-24

Hochgeladen am

2024-06-26 10:36:03

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen