Table of Contents
Fetching ...

RankingSHAP -- Listwise Feature Attribution Explanations for Ranking Models

Maria Heuss, Maarten de Rijke, Avishek Anand

TL;DR

RankingSHAP introduces listwise feature attribution for ranking models to address the limitations of pointwise explanations. It extends SHAP through listwise masking and flexible explanation objectives, enabling faithful, contrastive understanding of ranking decisions across ranked lists (e.g., Kendall's tau $\\tau$). The authors define evaluation paradigms (Preservation and Deletion) to assess attribution faithfulness on LtR benchmarks MQ2008 and MSLR, and provide a white-box toy example to illustrate interpretability benefits. Empirical results demonstrate that RankingSHAP yields faithful attributions and can reveal model biases, while acknowledging computational costs and interpretability challenges, with a public code repository for reproducibility.

Abstract

While SHAP (SHapley Additive exPlanations) and other feature attribution methods are commonly employed to explain model predictions, their application within information retrieval (IR), particularly for complex outputs such as ranked lists, remains limited. Existing attribution methods typically provide pointwise explanations, focusing on why a single document received a high-ranking score, rather than considering the relationships between documents in a ranked list. We present three key contributions to address this gap. First, we rigorously define listwise feature attribution for ranking models. Secondly, we introduce RankingSHAP, extending the popular SHAP framework to accommodate listwise ranking attribution, addressing a significant methodological gap in the field. Third, we propose two novel evaluation paradigms for assessing the faithfulness of attributions in learning-to-rank models, measuring the correctness and completeness of the explanation with respect to different aspects. Through experiments on standard learning-to-rank datasets, we demonstrate RankingSHAP's practical application while identifying the constraints of selection-based explanations. We further employ a simulated study with an interpretable model to showcase how listwise ranking attributions can be used to examine model decisions and conduct a qualitative evaluation of explanations. Due to the contrastive nature of the ranking task, our understanding of ranking model decisions can substantially benefit from feature attribution explanations like RankingSHAP.

RankingSHAP -- Listwise Feature Attribution Explanations for Ranking Models

TL;DR

RankingSHAP introduces listwise feature attribution for ranking models to address the limitations of pointwise explanations. It extends SHAP through listwise masking and flexible explanation objectives, enabling faithful, contrastive understanding of ranking decisions across ranked lists (e.g., Kendall's tau ). The authors define evaluation paradigms (Preservation and Deletion) to assess attribution faithfulness on LtR benchmarks MQ2008 and MSLR, and provide a white-box toy example to illustrate interpretability benefits. Empirical results demonstrate that RankingSHAP yields faithful attributions and can reveal model biases, while acknowledging computational costs and interpretability challenges, with a public code repository for reproducibility.

Abstract

While SHAP (SHapley Additive exPlanations) and other feature attribution methods are commonly employed to explain model predictions, their application within information retrieval (IR), particularly for complex outputs such as ranked lists, remains limited. Existing attribution methods typically provide pointwise explanations, focusing on why a single document received a high-ranking score, rather than considering the relationships between documents in a ranked list. We present three key contributions to address this gap. First, we rigorously define listwise feature attribution for ranking models. Secondly, we introduce RankingSHAP, extending the popular SHAP framework to accommodate listwise ranking attribution, addressing a significant methodological gap in the field. Third, we propose two novel evaluation paradigms for assessing the faithfulness of attributions in learning-to-rank models, measuring the correctness and completeness of the explanation with respect to different aspects. Through experiments on standard learning-to-rank datasets, we demonstrate RankingSHAP's practical application while identifying the constraints of selection-based explanations. We further employ a simulated study with an interpretable model to showcase how listwise ranking attributions can be used to examine model decisions and conduct a qualitative evaluation of explanations. Due to the contrastive nature of the ranking task, our understanding of ranking model decisions can substantially benefit from feature attribution explanations like RankingSHAP.
Paper Structure (36 sections, 10 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 36 sections, 10 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: Flow chart of a biased and an unbiased model for a talent search task. With the help of explanations we would like to be able to differentiate between the two.
  • Figure 2: Feature attribution values for different query scenarios from Section \ref{['section:experiments_scenarios']}.
  • Figure 3: Feature attribution values, for RankingSHAP with the $g_q^{exp(d)}$ exposure objective defined in Section \ref{['section:method_explanation_objectives']} and Pointwise SHAP for individual candidate in the ranked list.
  • Figure 4: Preservation (a, c, e, g) and Deletion Check (b, d, f, h). Only features top-$k$ of the explanations are kept/ masked. For the kendalltau measure, higher numbers represent higher similarity with the original rank, so for the Preservation check higher is better while for the Deletion check lower is better. For the exposure-base measure, it is exactly the other way around since lower numbers represent exposure closer to the original one.

Theorems & Definitions (1)

  • definition 1