Welcome everybody to the pattern recognition exercises. Today we are going to be solving
the first set of exercises corresponding to the topic Bayesian classifier. So during the session
you can ask questions after I explain each of the exercises, but you can also interrupt if you feel
it necessary. But it is better if you ask at the end. Okay, I just forgot activate the video. Okay,
so do you have any questions before I start? Okay, you can ask either via the chat or you
can use the microphone. Let me see the chat. Okay, yeah, okay, we do not have any questions. Yeah,
the meeting is being recorded. Okay, so I will start solving the exercises. And first I'll go
to the problem statement. So the problem statement is as follows. I still didn't notice that two-thirds
of the emails that she received were spam. And she decided to use a Bayesian classifier to
determine whether an incoming email is spam or ham. And by inspecting all her previous emails,
she decided to, she noticed that certain words occurred with different probabilities. Those
words were biogram, bed, student, sports, and cinema. And then the student estimated that biogram,
the word biogram occurred in 50% of the spam emails. Bed in 30%, student in 5% of the emails,
the sport in 2% and cinema also in 2%. And on the contrary, in the ham emails, the probabilities
were different. Biogram had a 0% or it was present in 0% of the emails, so it was not present. Bed
was present in 10%, student in 40%, sports in 30%, and cinema in 10% of the emails. And then the
student only registered whether the word was present or not. And for simplicity, she assumed
that the words occurred independently. So do you have questions in the problem statement?
Okay, I can proceed with the first question. The first question is, has the student considered
all the postulates of pattern recognition? So let's review one by one. So the first postulate
is about having a representative sample. And in this case, we can say that the student did
consider this postulate because the probabilities were estimated all the previous emails. So the
priors and the conditionals were estimated using all the previous emails. So the second postulate
is about the features. And then we can also say that it was considered by the student because the
words in the emails characterize the classes. So we have, as we saw in the table, we have these
features and we can see that they occur with different probabilities. And then in each of
the classes, so we can say that the words in the emails characterize the classes. A compact domain.
And the student, okay, so we have in the postulate number three, that we should have a compact domain.
And then, yes, the student did choose words with different probabilities, as I show in the table,
they are closely related to both of these postulates. And then the fourth postulate is about
having simpler constituents or parts. And yes, the student characterized the text in words and
she didn't try to use the complete email as a feature. So it is, the text is expressed in simpler
elements or parts. In this case, use five parts and not a complete set of words that it could be
with the complete email. So for the fifth postulate, we have the structure. And the student did not
consider the ordering of the words or the sentence structure, grammar. So she didn't care if bed
appeared before student or biograph appeared before sports. So she only registered with the
day occurred or not. Or also the amount of times that the word was present in the email was not
considered only if they occurred or not. So we have binary variables. And similarity. The sixth
postulate is about the similarity. This postulate says that if we have two patterns, two samples
from the same, in this case, if we have two emails, they should be similar. So if the words in the
email are similar, then the representation, the simpler representation is going to be also similar.
So for example, if we have two spam emails, we expect that the simpler representation with
these five features is going to be more similar than to two elements from the other class.
Yeah, then the patterns are going to be similar if they are from other class.
So do you have questions? Okay. We don't have similarity in our case, right? So yeah,
we have similarity in our case. So because as here is expressed, for example, let's consider
the first feature. We have the feature diagram that it only occurs in a spam email. It does not
occur in ham emails. So that means that if you have a text, an email from a spam and an email for
ham, the most likely is that ham is not going to have a zero in the first element. So here,
this is only the set of features. But then at the end, you are going to represent the sample with,
Zugänglich über
Offener Zugang
Dauer
01:29:18 Min
Aufnahmedatum
2020-11-13
Hochgeladen am
2020-11-13 21:07:42
Sprache
en-US
Solution and explanation for the first set of theoretical exercises.