>
What is Probabilistic Retrieval?
In probabilistic retrieval, a numeric weight is assigned to each
retrieved document as an estimate of the probability that it is relevant
to the query. Documents are presented to the user in weight order, so
those which are considered to be the best match come at the top
of the hit list.
The two most important factors in the weight calculation are:
- the number of different query terms in the retrieved document,
- the frequency of those terms in the database as a whole. Frequent
terms are given a lower weight than rare terms, as they are assumed to be
less specific, and therefore less useful for pinpointing relevant
documents.
When retrieving from databases of long documents, two other factors are
considered:
- the number of times each query term occurs in the retrieved
document,
- the length of the document as a whole.
Documents retrieved by the original query are presented to the user, who
is asked to supply relevance feedback about them. Given a set of
relevant documents, the original query terms are re-weighted (and additional
terms extracted) in order to identify other similar documents.
This is an effective method of query expansion.
The probabilistic retrieval system used for the CILKS project was
OKAPI.