T-Retrievability: A Topic-Focused Approach to Measure Fair Document Exposure in Information Retrieval
Xuejun Chang, Zaiqiao Meng, Debasis Ganguly
TL;DR
The paper addresses exposure fairness in information retrieval by showing that traditional collection-level retrievability conflates topical relevance priors with access. It introduces Topical-Retrievability (T-Retrievability), a localized measure that computes retrievability over groups of topically related queries and aggregates these scores to a collection-level statistic using Gini-based exposure fairness. The method replaces the cut-off dependent retrievability with a rank-based formulation $r(D, \mathcal{C}, \mathcal{Q}, \theta) = \frac{1}{|\mathcal{Q}|} \sum_{Q \in \mathcal{Q}} \frac{1}{\log(1+\rho(D;Q, \theta))}$ and leverages real user queries from MS MARCO dev, grouping queries via K-means on both sparse (TF-IDF) and dense (SBERT) representations into $K$ topical clusters. By computing $r(D, \mathcal{C}, \mathcal{Q}_i, \theta)$ for each topic, deriving per-topic Gini coefficients, and aggregating with min/avg/max, the paper demonstrates that localized analysis reveals nuanced exposure fairness patterns that collection-level measures miss. Experiments on BM25, SPLADE, TCT-ColBERT, and reranked variants on MS MARCO show substantial variation in exposure fairness across models and topic granularities, underscoring the value of topic-focused auditing for fair document exposure in IR systems.
Abstract
Retrievability of a document is a collection-based statistic that measures its expected (reciprocal) rank of being retrieved within a specific rank cut-off. A collection with uniformly distributed retrievability scores across documents is an indicator of fair document exposure. While retrievability scores have been used to quantify the fairness of exposure for a collection, in our work, we use the distribution of retrievability scores to measure the exposure bias of retrieval models. We hypothesise that an uneven distribution of retrievability scores across the entire collection may not accurately reflect exposure bias but rather indicate variations in topical relevance. As a solution, we propose a topic-focused localised retrievability measure, which we call \textit{T-Retrievability} (topic-retrievability), which first computes retrievability scores over multiple groups of topically-related documents, and then aggregates these localised values to obtain the collection-level statistics. Our analysis using this proposed T-Retrievability measure uncovers new insights into the exposure characteristics of various neural ranking models. The findings suggest that this localised measure provides a more nuanced understanding of exposure fairness, offering a more reliable approach for assessing document accessibility in IR systems.
