5 - 29.5 Information Retrieval [ID:35274]
50 von 224 angezeigt

Okay, welcome to the next video nugget.

This one is about information retrieval, a relatively simple application of language

technologies and natural language processing.

So what is information retrieval? That's the organization representation and storage of

information objects in general, so that you give the user easy access to the relevant

information and satisfy the user's various information needs.

That's a relatively general definition and you usually come in contact with information

research in the form of web search, think Google or Bing or those kind of things, Duck

Duck Go.

And there the information that is organized and represented and stored is the information

of the World Wide Web or at least the accessible portion of that.

And the information need is more interesting, right?

You come to a web server engine with lots of information needs.

So for instance, I want to find out what the weather is tomorrow or I want to find out

whether it is really true that Obama died yesterday or those kind of things.

That is different.

This information need is different from what you usually get offered by a web search engine,

namely a little box in which you can fill in words.

So we think about information retrieval as something that retrieves information driven

by information.

So web search is really, as you know, a fully automatic process that responds to a user

query by returning a sort of document list, the famous 10 blue links that is relevant

to the user requirements expressed in the query.

And the query you can think of as an answer to the information need, but it's not the

same.

So typically what we do, and I'm going to show you now what's called the vector space

model for information retrieval, is that we basically think of documents and the queries

as essentially bags of words.

A bag is a multi-set.

So those can be, if you have a fixed vocabulary, then you can represent those as what we call

word frequency vectors in the real numbers.

The length of the vector is actually the size of the vocabulary.

All we're going to use is the natural numbers vectors, but you can sometimes you want to

normalize them.

Or it's convenient to immediately think about real numbers.

So if we have two documents as an example, like have a good day and have a great day,

then of course the joint vocabulary is have a good, great day.

And then you have the two word, then word, then good has the word frequency vector that

takes, has a one in the third position and great as a one in the fourth condition.

And this first document is, has all the words there exactly once except for great.

So we're, we have a one zero here in the great position.

The idea that these vector space models actually pursue is that for web search, you can represent

the documents as world word frequency vectors.

And then you return those documents that are in a way similar as word vectors to the, to

the word to the query.

Okay.

So in, in a picture is essentially if you have here where to be able to see it, we'll

represent two vectors as, and we're going to call them similar if the vectors point

into essentially the same direction.

And if you think of that direction right.

Teil eines Kapitels:
Chapter 29. Natural Language Processing

Zugänglich über

Offener Zugang

Dauer

00:18:56 Min

Aufnahmedatum

2021-07-02

Hochgeladen am

2021-07-02 14:27:01

Sprache

en-US

Einbetten
Wordpress FAU Plugin
iFrame
Teilen