CILKS Final Report : Evaluation Experiments

Evaluation Experiments

Controlled experiments were conducted with users (members of staff and students within the School of Informatics at City University) who wished to find relevant references in the INSPEC document database. Users were asked to select thesaurus terms to enhance their query, then to provide relevance judgements on sets of documents retrieved by a) the original query, and b) various forms of enhanced query.

Data was collected automatically, in the form of navigation logs, but also by short questionnaires administered to users before during and after their work with the thesaurus. Three separate experiments were run, with about 50 participants in all.

Experiment 1:
- finalised the thesaurus navigation logging procedure, and the form of the reports to be generated,
- identified user problems about thesaurus concepts and the navigation interface,
- found that "enhanced" queries consisting only of controlled-language thesaurus terms would not on average outperform the original query.
- established the need for user relevance judgements to be made on "pooled" document lists, to eliminate order effects.
Experiment 2:
- presented a more robust and convenient user interface, based on feedback from experiment 1,
- provided additional information (abstracts) from retrieved documents for user relevance judgements,
- investigated the performance of "hybrid" queries consisting of both original query terms and those extracted from the thesaurus,
- logged all individual relevance judgements in the database for detailed analysis of the effects of query enhancement (see relevance statistics).
Experiment 3 focused on the the handling of multi-faceted queries.