Table of Contents
Fetching ...

An Intrinsic Framework of Information Retrieval Evaluation Measures

Fernando Giner

TL;DR

This paper tackles how to determine the metric and scale properties of IR evaluation measures from first principles. It introduces an intrinsic framework, $(oldsymbol{ ilde{R}}, \preceq_f, d_f)$, derived from the measure itself, and a taxonomy that classifies measures into ordinal/pseudometric, ordinal/metric, or interval/metric based on attained values. Through analysis of set-based and rank-based measures, it shows that set-based measures typically enjoy metric and interval properties, while rank-based measures are mostly ordinal and pseudometric, with a few binary-case exceptions yielding interval characteristics. The framework clarifies which mathematical operations are justifiable on evaluation scores and highlights the trade-off between formal properties and aligning with user usefulness. Overall, it provides a principled lens for IR theory and practice to reason about evaluation measures and their interpretability.

Abstract

Information retrieval (IR) evaluation measures are cornerstones for determining the suitability and task performance efficiency of retrieval systems. Their metric and scale properties enable to compare one system against another to establish differences or similarities. Based on the representational theory of measurement, this paper determines these properties by exploiting the information contained in a retrieval measure itself. It establishes the intrinsic framework of a retrieval measure, which is the common scenario when the domain set is not explicitly specified. A method to determine the metric and scale properties of any retrieval measure is provided, requiring knowledge of only some of its attained values. The method establishes three main categories of retrieval measures according to their intrinsic properties. Some common user-oriented and system-oriented evaluation measures are classified according to the presented taxonomy.

An Intrinsic Framework of Information Retrieval Evaluation Measures

TL;DR

This paper tackles how to determine the metric and scale properties of IR evaluation measures from first principles. It introduces an intrinsic framework, , derived from the measure itself, and a taxonomy that classifies measures into ordinal/pseudometric, ordinal/metric, or interval/metric based on attained values. Through analysis of set-based and rank-based measures, it shows that set-based measures typically enjoy metric and interval properties, while rank-based measures are mostly ordinal and pseudometric, with a few binary-case exceptions yielding interval characteristics. The framework clarifies which mathematical operations are justifiable on evaluation scores and highlights the trade-off between formal properties and aligning with user usefulness. Overall, it provides a principled lens for IR theory and practice to reason about evaluation measures and their interpretability.

Abstract

Information retrieval (IR) evaluation measures are cornerstones for determining the suitability and task performance efficiency of retrieval systems. Their metric and scale properties enable to compare one system against another to establish differences or similarities. Based on the representational theory of measurement, this paper determines these properties by exploiting the information contained in a retrieval measure itself. It establishes the intrinsic framework of a retrieval measure, which is the common scenario when the domain set is not explicitly specified. A method to determine the metric and scale properties of any retrieval measure is provided, requiring knowledge of only some of its attained values. The method establishes three main categories of retrieval measures according to their intrinsic properties. Some common user-oriented and system-oriented evaluation measures are classified according to the presented taxonomy.
Paper Structure (13 sections, 3 theorems, 26 equations, 1 figure, 3 tables)

This paper contains 13 sections, 3 theorems, 26 equations, 1 figure, 3 tables.

Key Result

proposition thmcounterproposition

Let $(\mathbf{R}, \preceq_{f}, d_{f})$ be the intrinsic framework of an IR evaluation measure, $f$, then the associated distance, $d_{f}$, is a pseudometric.

Figures (1)

  • Figure 1: Example of Hasse diagram, $G_{f}$, associated to a retrieval measure.

Theorems & Definitions (8)

  • remark thmcounterremark
  • definition thmcounterdefinition
  • proposition thmcounterproposition
  • proposition thmcounterproposition
  • proposition thmcounterproposition
  • proof
  • proof
  • proof