On the Biased Assessment of Expert Finding Systems

Jens-Joris Decorte; Jeroen Van Hautte; Chris Develder; Thomas Demeester

On the Biased Assessment of Expert Finding Systems

Jens-Joris Decorte, Jeroen Van Hautte, Chris Develder, Thomas Demeester

TL;DR

This case study demonstrates on a popular benchmark that system-validated annotations lead to overestimated performance of traditional term-based retrieval models and even invalidate comparisons with more recent neural methods, and proposes constraints to the annotation process to prevent these biased evaluations.

Abstract

In large organisations, identifying experts on a given topic is crucial in leveraging the internal knowledge spread across teams and departments. So-called enterprise expert retrieval systems automatically discover and structure employees' expertise based on the vast amount of heterogeneous data available about them and the work they perform. Evaluating these systems requires comprehensive ground truth expert annotations, which are hard to obtain. Therefore, the annotation process typically relies on automated recommendations of knowledge areas to validate. This case study provides an analysis of how these recommendations can impact the evaluation of expert finding systems. We demonstrate on a popular benchmark that system-validated annotations lead to overestimated performance of traditional term-based retrieval models and even invalidate comparisons with more recent neural methods. We also augment knowledge areas with synonyms to uncover a strong bias towards literal mentions of their constituent words. Finally, we propose constraints to the annotation process to prevent these biased evaluations, and show that this still allows annotation suggestions of high utility. These findings should inform benchmark creation or selection for expert finding, to guarantee meaningful comparison of methods.

On the Biased Assessment of Expert Finding Systems

TL;DR

Abstract

Paper Structure (19 sections, 1 equation, 2 figures, 1 table)

This paper contains 19 sections, 1 equation, 2 figures, 1 table.

Introduction
Related Work
Expert finding systems
Expert finding benchmarks
Expert profiling benchmarks
Analysis of Annotation Schemes
TU Expert Collection
Distribution of system-validated annotations
Term-bias in system-validated annotations
Assessment of Expert Finding
Expert Finding Models
Term-based retrieval:
Late-interaction neural retrieval:
Query Augmentation
Alternative Annotation Suggestions
...and 4 more sections

Figures (2)

Figure 1: Distribution of tf-idf scores for self-selected topics that were part of the annotation suggestions versus those that were not. Significantly higher scores are observed for those that were part of the annotation suggestions.
Figure 2: Distribution of tf-idf scores for self-selected topics versus additional topics added through system validation.

On the Biased Assessment of Expert Finding Systems

TL;DR

Abstract

On the Biased Assessment of Expert Finding Systems

Authors

TL;DR

Abstract

Table of Contents

Figures (2)