Apparently the question on n-grams didn't work quite well.
Was there a problem with the input again?
Or is 6 to the power of 3 hard to compute in your head?
Okay then let's maybe do a very quick recap because there's been quite a lot of stuff
for which I wasn't even here.
N-grams, N-grams, N-grams, N-grams, N-grams, N-grams, N-grams, N-grams, N-grams, N-grams,
N-grams, N-grams, N-grams, N-grams, N-grams, N-grams, N-grams, N-grams, N-grams, N-grams,
N-grams, N-grams, N-grams, N-grams, N-grams, N-grams, N-grams, N-grams, N-grams, N-grams,
For trigrams, yes we do consider the order.
It does make a difference whether we have Ci given Ci minus 2 and Ci minus 1.
Those are three different words, right?
So I don't know, let's say this is like the dog ate or whatever, then the dog ate is a different trigram than a, ate the dog.
That would be a different trigram.
So over a language with six words, it's literally just any combination you can form consisting of three words, i.e. six choices times six choices times six choices. Yes.
Oh yeah, sure. Dog, dog, dog is a perfectly valid English string.
It's not grammatical, but nobody says that N-3-grams need to be grammatical, right?
The grammaticality of like trigrams you basically get via these models.
So if you have a decent trigram over an English set of words, you would expect the probability of the word dog given dog and dog to be very small.
Yeah.
In Bayesian networks, we also consider an order of words. Is that the same order?
I wouldn't think of those as being analogous to Bayesian networks.
I would more think of them as basically hidden Markov models or rather a stochastic process in the sense that every one of those C-I is basically a sample from the same distribution because it's just a random word of some language.
So, yeah.
And again, like the fact that some are just extremely unlikely or rather impossible to occur in a proper English grammar, you get from this probability model rather than by excluding them from the get-go.
Because if you think, I can easily form sentences that contain the same word twice.
And if it's just pronouns or whatever.
Yeah.
Okay.
Apart from that, are we happy with n-grams in general?
Okay.
Obviously, we can use that for classification things.
We just build a n-gram for whatever class we're interested in and then just compute the likelihood of that class given a certain string.
Then we just do the usual Bayesian stuff.
Okay.
Other applications of character n-gram models.
Basically, any kind of classification task.
Build an n-gram for every class and then just compare the probabilities.
Yes.
Can I say that it's a naive Bayesian classifier kind of thing?
It's naturally related just by virtue of us using lots of Bayes in there.
It's not quite a naive Bayesian model because we don't assume necessarily...
In a naive Bayesian model, we would assume that the observable variables are all conditionally independent, which in this case, we do not do.
So you can, of course, map the classes to some hidden state, i.e. exactly the cause of the naive Bayesian network.
On that level, the analogy still works.
But we only have...
Well, it's a trivial naive Bayesian network with a single observable variable, which is the current word.
Then that mapped along a time structure.
In that sense, you would have that.
But if you add the time structure, then you basically have one observable variable for each word or for each time step in a string.
And those are definitely not independent.
Presenters
Zugänglich über
Offener Zugang
Dauer
01:16:54 Min
Aufnahmedatum
2024-07-16
Hochgeladen am
2024-07-17 05:19:40
Sprache
en-US