KBS Techniques in Thesaurus-based Query Expansion
A thesaurus can be considered as a knowledge base, though one designed
for use by humans with both linguistic and domain expertise. An
intelligent system to exploit thesaurus knowledge would
perform automatic or system-assisted query expansion, based on the
strategies used by human intermediaries or expert searchers.
Existing heuristics are generally defined in the context of boolean
searching, and involve explicit broadening or
narrowing of queries, e.g.
- Use generic terms and related terms to obtain a very broad
treatment of the search topic,
- Move up or down in the thesaurus hierarchy to modify
specificity,
- For high specificity, use only controlled vocabulary.
Such suggestions could be offered by simple advice-giving
expert systems, but are too weak to serve as a basis for automatic
query expansion, even when converted into more formal rules.
Alternatively, terms found during thesaurus navigation may be
ordered and selected according to some weight reflecting
their "closeness" to the original query, or other potentially
useful properties. For example:
- the distance of the new term from the starting point
of the search,
- the number of individual paths linking the new term
with the starting point,
- the type of link followed, e.g. are hierarchical
relationships "stronger" than associative relationships?
- the specificity of the new term, judged by its:
- number of occurrences in documents,
- number of thesaurus connections,
- position within thesaurus hierarchies.
Much of the CILKS project was concerned with investigating user
patterns of thesaurus navigation under such headings, in order
to find a realistic basis for weighting procedures and
system-assisted term selection.