The TREC interactive task used 25 adhoc topics selected for this purpose by Donna Harman.
In what follows, it is assumed that (for the purposes of this task) each topic is the subject of an interactive search by a single searcher. The task has two parts: the primary task defines the interactive phase of the search, the secondary task (optional but recommended) involves the choice of a single search formulation and a subsequent off-line search. There is also an optional but recommended baseline task, a comparable non-interactive search.
Participants may need to refine the specification given, and/or provide additional guidance, in their instructions to their searchers. Any such refinement/guidance should form part of the report.
"Find as many documents as you can which address the given information problem, but without too much rubbish. You should complete the task in around 30 minutes or less."
It will be necessary for the system and/or the searcher to record and report on the progress and outcome of the search, in various ways. There follows a series of notes on specific items that need to be recorded; below is a partial specification of the reporting format.
Time taken: the elapsed (clock) time taken for the search, from the time the searcher first sees the topic until s/he declares the search to be finished, should be recorded. It is assumed that the interactive search takes place in one uninterrupted session. If a session is unavoidably interrupted, it is recommended that it be abandoned and the topic given to another searcher.
Sequence of events: all significant events in the course of the interaction should be recorded. The events listed below are those that seem to be fairly generally applicable to different systems and interactive environments; however, the list may need extending or modifying for specific systems.
Timing of events: it may be necessary to record the times of individual events in the interaction (see below).
Intermediate search formulations: if appropriate to the system, these should be recorded.
Documents viewed: "viewing" is taken to mean the searcher seeing a title or some other brief information about a document; these events should be recorded.
Documents seen: "seeing" is taken to mean the searcher seeing the text of a document, or a substantial section of text; these events should be recorded.
Terms entered by the searcher: if appropriate to the system, these should be recorded.
Terms seen (offered by the system): if appropriate to the system, these should be recorded.
Selection/rejection: documents or terms selected by the user for any further stage of the search (in addition to the final selection of documents).
"Sparse" format: a list of the identifiers of the selected documents for each topic, together with the elapsed (clock) time of the search.
"Rich" format: for each topic, the sequence of events as indicated above, and perhaps the times of events. A fuller specification of this rich format will be made at a later date; it is likely to require further interaction among the groups taking part, to ensure that all groups can comply.
Two further items should be reported. A full narrative description should be given of the interactive session for one specified topic (this topic will be specified at a later date). As indicated above, any further guidance and/or refinement of the task specification given to the searchers should also be reported.
As secondary measures for the primary task, we will be looking for additional measures of performance, and also measures of the effort and complexity of the search, based on what actions and decisions the searcher takes. Measures of performance may include the utility measures being considered by the filtering track for set-retrieval evaluation. Effort measures may include the average search time per relevant selected, the number of documents viewed or seen, and the density of relevant documents selected as a proportion of documents viewed at each stage (a sort of local precision). Some such measures will be defined in association with the "rich" reporting format described above, but it is likely that different measures would be appropriate for different systems, and participants are invited to suggest their own ways of making such measurements and to present the results in their papers. The object would be not only to allow for between-system comparisons, but also to provide diagnostic information on any aspects of the search process.
It is for the searcher to generate at the end of the interactive session a ranked list of 1000 documents. We envisage that this will be achieved by the searcher choosing an appropriate search formulation from those already tried, or by defining a new one on the basis of the experience of the interactive session. The chosen formulation could then be run off-line. The target is to include as many relevant documents as possible, as high up as possible, in this ranked list.
The specification of this task to the searchers will depend somewhat on the nature of the system and therefore the manner in which the ranked list can be generated, and should be reported.
The relation between this ranked list and any documents which figured in the primary task will depend on the specific system, and therefore needs to be fully specified in the report. We envisage the following as an appropriate relation:
Generally: specification given to searchers; relation between the ranked list and items encountered in the interactive session.
The ranked list of 1000 items will be evaluated using standard TREC evaluation methods, as used for the main TREC tasks. The results would be, at least at some level, comparable with the main adhoc TREC runs.
There are many possible ways one might set up such a baseline for comparison, though there may be systems for which no non-interactive baseline is possible. The idea would be for a run which would use essentially the same system as is used for the interactive run, but without interaction. The starting point might be the topics as given, or a manually derived query (but without reference to the documents).
The baseline task should produce a ranked list of 1000 documents, to be reported and evaluated in the usual TREC fashion.
Latex source for this specification:
Postscript formatted for US letter size:
ser@is.city.ac.uk