SOCKER Kernel Evaluation: an Analysis of Searching Behaviour, User Perceptions and System Performance

Introduction

This project involved the Centre acting as a test site for an EU funded project by carrying out user operational testing for the SOCKER kernel based on the Z39.50 protocol. The test focuses on the facility of the system to search different servers or targets with a range of databases from two remote clients in a networked environment.

The user interfaces of the two clients were compared, including a graphical user interface for the Information Services Workstation (ISW), and a command based system for the Network Service Point (NSP). In ISW servers are grouped into various sets described as 'profiles';. Users are offered a choice of profile to search within which individual servers may be selected. In NSP users are directly presented with a single list of servers to choose from. The experiment required users to execute three information retrieval set tasks on one of the two clients to test specific aspects of user behaviour and system functionality. The aims, objectives and methodology of the experiment are briefly described below.

Aims and objectives

The aim of the evaluation was to test the implementation of a system allowing for multi-target searching from a common client interface over a communication network. Specific evaluation objectives were as follows:

To compare the ease of use of the different client interfaces for query formulation and searching in a distributed environment,
To test the system's retrieval effectiveness for searching multiple databases, and
To determine the user satisfaction and perception of the system.

Methodology

Three search tasks were set to test the different aspects of the system functionality. User behaviour in formulating queries was monitored quantitatively, as far as possible from transaction logs, complemented by questionnaire data and comments from test administrators. More qualitative data and analysis was also provided from direct user observations made by the administrator at the City University site.

Search tasks tested a combination of features including:

searching for an item unique to a small number of targets, across many targets;
finding duplicate records from various targets to locate different editions of a work;
searching for a multi-faceted topic in several targets.

Data was collected and analysed from three sources:

Questionnaires: these sought information about users experience and familiarity with different online systems, how the various tasks were undertaken and users' reactions to the system interfaces and facilities.
Transaction logs: these gave details of servers selected and accessed, connection times and search formulation details. The latter included terms used, and term modification and search fields employed.
Observation: this method includes notes made by site co-ordinators and the research assistant at City University who administered the test. Much of this material is qualitative, but the findings made by observation were complemented by the quantitative data derived from the logs. Because the data set collected at City University was more complete, it was possible to undertake more detailed analysis on this sub-set.

Conclusions and recommendations

The operational pilot testing clearly demonstrated the successful implementation of the SOCKER Z39.50 kernel. However in assessing the viability and quality of such a service for future development, other interrelated factors needed to be taken into consideration at different levels. These included:

The mapping of record formats and searchable fields across multiple databases,
Interface features to support effective query formulation and searching,
The division of labour between the client and the server in the network environment.

In addition the pilot test also raised questions regarding the experimental design and methodology of the evaluation.

Experimental design methodology

There were a number of constraints in the test administration and data collected which need to be overcome in future trials. Firstly, greater effort should be made to recruit end users as test subjects as well as library professional staff, since presumably GUI clients are primarily aimed at end users. Secondly, data collection necessitates not only more automatic methods but also complementary methods for eliciting information from users, through online questionnaires and transaction logging.

User performance and search tasks

The three search tasks set for the evaluation experiments were intended to represent standard bibliographic enquiries and the given order was based on an increasing degree of difficulty. Contrary to expectations Task 1, the specific item search, turned out to be the most difficult, with a substantial number of users failing to complete it. Conversely, Task 3, the subject search, was the easiest, with the majority of users succeeding. It is possible that the considerable effort required for Task 1 served to familiarise users with the system, to the benefit of subsequent tasks. However it appears that this environment is more amenable to a general browsing task than to a specific item search, unlike searching individual library catalogues.

Test subjects experienced problems in searching multiple servers and databases for two main reasons:

They were unable to make an informed choice on the most appropriate server/database to search because of the lack of information on the sources available as well as the random presentation or listing of the profiles, servers and databases.
They could not specify query terms which readily matched appropriate searchable fields in the bibliographic record. This was particularly problematic for end users.

It would appear that the logical grouping of available servers/databases is a paramount requirement. Even though defaults should probably be set to reduce the effort required by users in making choices, searchers need to have some flexibility and be able to easily intervene and tailor their own selection if necessary.

Although search intermediaries and end users may be aware of the fact that there isn't a common approach to searching different databases due to variations in indexing policies, the inconsistencies are greatly magnified in attempting to search multiple sources from a client with a common interface. The poor mapping of searchable fields in spite of the standard MARC format remains a major obstacle for effective searching.

Interaction and the client interfaces

The searching behaviour of the test subjects and the ease in which they were able to carry out the tasks was influenced by the different interface environments of the two clients. Although the test subjects of the NSP and ISW clients were primarily experienced with the interaction styles of the respective command and windows based interfaces, some contrasting responses were observed or expressed.

The number and presentation of options for selecting multiple servers/databases and choices for formulating queries was more complex for ISW than for NSP searchers and was not necessarily beneficial.
Feedback information and concurrent displays made it much easier for ISW searchers to keep track of the system's searching activity and contributed to greater user satisfaction.
Search response times for the ISW local client were more acceptable to searchers than those of the remote NSP client.

The increased number of options which can be presented to a user in a GUI environment can easily increase the cognitive load on the user and impede user/system interaction. More attention needs to be paid to striking a balance between the user/system control in the search dialogue and process, and to determine to what extent the system's operations are made apparent to the user.

System performance

The general impression is that the system does provide a usable search facility, but that there is considerable redundancy which might affect its viability if adopted on a large scale. Some of the things which appear to be happening are:

multiple associations in force for the same server simultaneously,
associations remaining in force when no longer required,
the same database being searched on different servers,
searches on inappropriate record fields.

The questionnaire data confirms that users in general had little understanding of the implications of their decisions regarding choice of server, for example, and were not aware when it gave rise to duplication of effort. Some of these problems could be solved by giving users more information and more control, for instance by allowing them to terminate unwanted associations explicitly. However this approach is somewhat at variance with the presumed objective of the system being evaluated, to provide users with a simple front-end and make sensible decisions in the background on their behalf. So the better way forward may be to build more intelligence into the client software, to eliminate some redundant message interchanges and reduce network traffic accordingly.