2 - Ex01 Bayesian Classifier [ID:23734]
50 von 527 angezeigt

Welcome everybody to the pattern recognition exercises. Today we are going to be solving

the first set of exercises corresponding to the topic Bayesian classifier. So during the session

you can ask questions after I explain each of the exercises, but you can also interrupt if you feel

it necessary. But it is better if you ask at the end. Okay, I just forgot activate the video. Okay,

so do you have any questions before I start? Okay, you can ask either via the chat or you

can use the microphone. Let me see the chat. Okay, yeah, okay, we do not have any questions. Yeah,

the meeting is being recorded. Okay, so I will start solving the exercises. And first I'll go

to the problem statement. So the problem statement is as follows. I still didn't notice that two-thirds

of the emails that she received were spam. And she decided to use a Bayesian classifier to

determine whether an incoming email is spam or ham. And by inspecting all her previous emails,

she decided to, she noticed that certain words occurred with different probabilities. Those

words were biogram, bed, student, sports, and cinema. And then the student estimated that biogram,

the word biogram occurred in 50% of the spam emails. Bed in 30%, student in 5% of the emails,

the sport in 2% and cinema also in 2%. And on the contrary, in the ham emails, the probabilities

were different. Biogram had a 0% or it was present in 0% of the emails, so it was not present. Bed

was present in 10%, student in 40%, sports in 30%, and cinema in 10% of the emails. And then the

student only registered whether the word was present or not. And for simplicity, she assumed

that the words occurred independently. So do you have questions in the problem statement?

Okay, I can proceed with the first question. The first question is, has the student considered

all the postulates of pattern recognition? So let's review one by one. So the first postulate

is about having a representative sample. And in this case, we can say that the student did

consider this postulate because the probabilities were estimated all the previous emails. So the

priors and the conditionals were estimated using all the previous emails. So the second postulate

is about the features. And then we can also say that it was considered by the student because the

words in the emails characterize the classes. So we have, as we saw in the table, we have these

features and we can see that they occur with different probabilities. And then in each of

the classes, so we can say that the words in the emails characterize the classes. A compact domain.

And the student, okay, so we have in the postulate number three, that we should have a compact domain.

And then, yes, the student did choose words with different probabilities, as I show in the table,

they are closely related to both of these postulates. And then the fourth postulate is about

having simpler constituents or parts. And yes, the student characterized the text in words and

she didn't try to use the complete email as a feature. So it is, the text is expressed in simpler

elements or parts. In this case, use five parts and not a complete set of words that it could be

with the complete email. So for the fifth postulate, we have the structure. And the student did not

consider the ordering of the words or the sentence structure, grammar. So she didn't care if bed

appeared before student or biograph appeared before sports. So she only registered with the

day occurred or not. Or also the amount of times that the word was present in the email was not

considered only if they occurred or not. So we have binary variables. And similarity. The sixth

postulate is about the similarity. This postulate says that if we have two patterns, two samples

from the same, in this case, if we have two emails, they should be similar. So if the words in the

email are similar, then the representation, the simpler representation is going to be also similar.

So for example, if we have two spam emails, we expect that the simpler representation with

these five features is going to be more similar than to two elements from the other class.

Yeah, then the patterns are going to be similar if they are from other class.

So do you have questions? Okay. We don't have similarity in our case, right? So yeah,

we have similarity in our case. So because as here is expressed, for example, let's consider

the first feature. We have the feature diagram that it only occurs in a spam email. It does not

occur in ham emails. So that means that if you have a text, an email from a spam and an email for

ham, the most likely is that ham is not going to have a zero in the first element. So here,

this is only the set of features. But then at the end, you are going to represent the sample with,

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:29:18 Min

Aufnahmedatum

2020-11-13

Hochgeladen am

2020-11-13 21:07:42

Sprache

en-US

Solution and explanation for the first set of theoretical exercises. 

Einbetten
Wordpress FAU Plugin
iFrame
Teilen