26 - Artificial Intelligence II [ID:47317]
50 von 363 angezeigt

Okay.

I apologize for being late.

So, we're still looking at, you could say, classical natural language processing.

One of the big things in there is measuring success.

We talked about precision and recall being mostly used, the mostly used measures.

The thing you should probably realize is that precision and recall aren't quite mathematically

dual.

We're always looking at the rate of true positives, but in two different situations,

the true positives and the false negatives, the true positives and the false positives.

You're not quite so the denominator might seem a bit unsystematic.

Normally if we have these dual things, then they're also mathematically dual, but these

are not.

We are, I would say, about 12 or so, probably at 16 measures that combine true positives,

false positives, negatives and so on in various ways.

There's a Wikipedia has a good introduction to that.

Now you should try and wrap your head around why we want exactly those, why those are

the interesting ones.

For the particular case of, in this case, binary classification.

Of course if you have larger classification tasks with more classes, then you can

generalize those by having separate classifiers for every one of those classes.

So precision and recall per class actually applies here.

So this is something to remember.

We looked at information retrieval.

The idea is, there's a couple of ideas, one is that you basically vectorize words.

My Eveways of doing that, basically the word frequency vector.

And there are less naive ways of doing that where you basically take frequency analysis,

which is just as always counting into account to kind of make more specific words have a higher

impact than less specific words.

And that's really what this TF IDF stuff does.

That's kind of the, I would say, baseline information retrieval.

Basically if you look at web search engines and typically most information retrieval systems

based on whatever they are, basically have these stages.

The first one is you want to harvest the information objects for web search engine that is

essentially crawling the web and storing the whole web on your servers.

Then you typically want to clean them up somehow, do a lot of pre-processing typically.

And then you do some kind of a vectorization, which allows you to do this cosyentric for

retrieval.

And then you either during that, for instance in the vector construction or later as in say

pay drank or so, you try to somehow weave this notion of relevance, relevance to the

average information need into the system.

And depending what the information needs are, depending on what the information objects

are, there's quite a lot of variations on these.

But and that's very important, information retrieval just gives you usually links to the

information objects.

And then it's up to either a later stage of processing or the human to somehow extract

or even combine information to satisfy the information need.

So it's typically information retrieval is typically either something that's directly

addressed to humans, which are very good in information processing or a first stage

in something more interesting.

If you think about the NLP task of say question answering, they typically have an information

Teil einer Videoserie :

Zugänglich über

Offener Zugang

Dauer

01:24:48 Min

Aufnahmedatum

2023-07-12

Hochgeladen am

2023-07-17 20:39:09

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen