Modeling Ranking Properties with In-Context Learning
Nilanjan Sinhababu, Andrew Parry, Debasis Ganguly, Pabitra Mitra
TL;DR
This work addresses multi-objective information retrieval by balancing relevance with auxiliary objectives such as fairness and topical/diversity. It introduces an in-context learning (ICL) framework that conditions a language model on demonstrations of desired ranking properties drawn from similar queries, eliminating the need for task-specific training. The method defines target distributions over metadata attributes, using a greedy KL-divergence-based induction to reorder top-ranked documents so that the final list aligns with the specified objectives. Empirical results across MS MARCO, TREC DL, TREC Fairness, and Touché show that demonstration-based ICL improves diversity and fairness while maintaining relevance, outperforming prompt-based baselines and several post-hoc methods. The work suggests demonstration-guided model adaptation as a practical, training-free approach for dynamic, multi-objective ranking in real-world IR systems, while acknowledging ethical considerations and limitations such as reliance on existing query logs and model capacity.
Abstract
While standard IR models are mainly designed to optimize relevance, real-world search often needs to balance additional objectives such as diversity and fairness. These objectives depend on inter-document interactions and are commonly addressed using post-hoc heuristics or supervised learning methods, which require task-specific training for each ranking scenario and dataset. In this work, we propose an in-context learning (ICL) approach that eliminates the need for such training. Instead, our method relies on a small number of example rankings that demonstrate the desired trade-offs between objectives for past queries similar to the current input. We evaluate our approach on four IR test collections to investigate multiple auxiliary objectives: group fairness (TREC Fairness), polarity diversity (Touché), and topical diversity (TREC Deep Learning 2019/2020). We empirically validate that our method enables control over ranking behavior through demonstration engineering, allowing nuanced behavioral adjustments without explicit optimization.
