next up previous contents
Next: Incremental Query Expansion Up: Implementation Issues Previous: Eliminating term duplication

Ranking Thresholds

The ranking threshold algorithm is applied to the retrieved document scores, to partition the list into those which match the query ``well'', ``fairly well'', and ``not very well'', and to drop those which fall below a certain threshold. (Section 3.6.6 describes how the list of retrieved documents is presented.)

The method is based on one used for earlier Okapi interfaces (VT100 and XOkapi), and first presented in British Library Research Paper 72: Improving Subject Retrieval in Online Catalogues by Steven Walker. The rules were relaxed slightly for ENQUIRE, notably by stating that whatever the degree of match at least the top 50 documents are to be displayed. This was introduced because in the context of the LBS database the original method appeared to be omitting potentially useful documents when there were few matches. The basic rules depend on two prior definitions:

Given these definitions, the threshold settings depend upon the number of terms in the query. For a single-term query, all retrieved documents are said to match well, because they have the maximum possible score. Thereafter:

With five or more terms the procedure is the same as that for three or four, except that the proportions of the distribution are different. Documents with of the maximum and more are said to match well, those with of the maximum match fairly well, those with of the maximum match not very well, and those with less than of the maximum are discarded.



next up previous contents
Next: Incremental Query Expansion Up: Implementation Issues Previous: Eliminating term duplication



PAYNE A
Wed Jul 3 14:11:32 BST 1996