An Intrinsic Framework of Information Retrieval Evaluation Measures

Fernando Giner

An Intrinsic Framework of Information Retrieval Evaluation Measures

Fernando Giner

TL;DR

This paper tackles how to determine the metric and scale properties of IR evaluation measures from first principles. It introduces an intrinsic framework, $(oldsymbol{ ilde{R}}, \preceq_f, d_f)$, derived from the measure itself, and a taxonomy that classifies measures into ordinal/pseudometric, ordinal/metric, or interval/metric based on attained values. Through analysis of set-based and rank-based measures, it shows that set-based measures typically enjoy metric and interval properties, while rank-based measures are mostly ordinal and pseudometric, with a few binary-case exceptions yielding interval characteristics. The framework clarifies which mathematical operations are justifiable on evaluation scores and highlights the trade-off between formal properties and aligning with user usefulness. Overall, it provides a principled lens for IR theory and practice to reason about evaluation measures and their interpretability.

Abstract

Information retrieval (IR) evaluation measures are cornerstones for determining the suitability and task performance efficiency of retrieval systems. Their metric and scale properties enable to compare one system against another to establish differences or similarities. Based on the representational theory of measurement, this paper determines these properties by exploiting the information contained in a retrieval measure itself. It establishes the intrinsic framework of a retrieval measure, which is the common scenario when the domain set is not explicitly specified. A method to determine the metric and scale properties of any retrieval measure is provided, requiring knowledge of only some of its attained values. The method establishes three main categories of retrieval measures according to their intrinsic properties. Some common user-oriented and system-oriented evaluation measures are classified according to the presented taxonomy.

An Intrinsic Framework of Information Retrieval Evaluation Measures

TL;DR

This paper tackles how to determine the metric and scale properties of IR evaluation measures from first principles. It introduces an intrinsic framework,

, derived from the measure itself, and a taxonomy that classifies measures into ordinal/pseudometric, ordinal/metric, or interval/metric based on attained values. Through analysis of set-based and rank-based measures, it shows that set-based measures typically enjoy metric and interval properties, while rank-based measures are mostly ordinal and pseudometric, with a few binary-case exceptions yielding interval characteristics. The framework clarifies which mathematical operations are justifiable on evaluation scores and highlights the trade-off between formal properties and aligning with user usefulness. Overall, it provides a principled lens for IR theory and practice to reason about evaluation measures and their interpretability.

Abstract

Paper Structure (13 sections, 3 theorems, 26 equations, 1 figure, 3 tables)

This paper contains 13 sections, 3 theorems, 26 equations, 1 figure, 3 tables.

Introduction
Related Work
Formalisation of the Intrinsic Framework
Intrinsic Properties of an IR Evaluation Measure
Metric Properties of an IR Evaluation Measure
Scale Properties of an IR Evaluation Measure
Intrinsic Taxonomy of IR Evaluation Measures
Some Examples
Set-Based Retrieval
Rank-Based Retrieval
Conclusions
Appendix
Formal Proofs

Key Result

proposition thmcounterproposition

Let $(\mathbf{R}, \preceq_{f}, d_{f})$ be the intrinsic framework of an IR evaluation measure, $f$, then the associated distance, $d_{f}$, is a pseudometric.

Figures (1)

Figure 1: Example of Hasse diagram, $G_{f}$, associated to a retrieval measure.

Theorems & Definitions (8)

remark thmcounterremark
definition thmcounterdefinition
proposition thmcounterproposition
proposition thmcounterproposition
proposition thmcounterproposition
proof
proof
proof

An Intrinsic Framework of Information Retrieval Evaluation Measures

TL;DR

Abstract

An Intrinsic Framework of Information Retrieval Evaluation Measures

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (1)

Theorems & Definitions (8)