Table of Contents
Fetching ...

OrdShap: Feature Position Importance for Sequential Black-Box Models

Davin Hill, Brian L. Hill, Aria Masoomi, Vijay S. Nori, Robert E. Tillman, Jennifer Dy

TL;DR

OrdShap targets the long-standing challenge of attributing predictions from sequential models to both token values and their positions. It builds a Shapley-based attribution with positional conditioning via a generalized characteristic function $ ilde{oldsymbol{ ilde{ u}}}$, producing a $d\times d$ attribution matrix $oldsymbol{\gamma}$ and two concise summaries: OrdShap-VI for value importance and OrdShap-PI for position importance. The authors establish a theoretical link between OrdShap-VI and the Sanchez-Bergantiños value, and provide two practical approximation schemes—Sampling and Least-Squares—to make the method scalable. Empirically, OrdShap demonstrates improved disentanglement of value and position across health, natural language, and synthetic data, offering more nuanced explanations for sequential black-box models with potential impact in high-stakes domains like healthcare.

Abstract

Sequential deep learning models excel in domains with temporal or sequential dependencies, but their complexity necessitates post-hoc feature attribution methods for understanding their predictions. While existing techniques quantify feature importance, they inherently assume fixed feature ordering - conflating the effects of (1) feature values and (2) their positions within input sequences. To address this gap, we introduce OrdShap, a novel attribution method that disentangles these effects by quantifying how a model's predictions change in response to permuting feature position. We establish a game-theoretic connection between OrdShap and Sanchez-Bergantiños values, providing a theoretically grounded approach to position-sensitive attribution. Empirical results from health, natural language, and synthetic datasets highlight OrdShap's effectiveness in capturing feature value and feature position attributions, and provide deeper insight into model behavior.

OrdShap: Feature Position Importance for Sequential Black-Box Models

TL;DR

OrdShap targets the long-standing challenge of attributing predictions from sequential models to both token values and their positions. It builds a Shapley-based attribution with positional conditioning via a generalized characteristic function , producing a attribution matrix and two concise summaries: OrdShap-VI for value importance and OrdShap-PI for position importance. The authors establish a theoretical link between OrdShap-VI and the Sanchez-Bergantiños value, and provide two practical approximation schemes—Sampling and Least-Squares—to make the method scalable. Empirically, OrdShap demonstrates improved disentanglement of value and position across health, natural language, and synthetic data, offering more nuanced explanations for sequential black-box models with potential impact in high-stakes domains like healthcare.

Abstract

Sequential deep learning models excel in domains with temporal or sequential dependencies, but their complexity necessitates post-hoc feature attribution methods for understanding their predictions. While existing techniques quantify feature importance, they inherently assume fixed feature ordering - conflating the effects of (1) feature values and (2) their positions within input sequences. To address this gap, we introduce OrdShap, a novel attribution method that disentangles these effects by quantifying how a model's predictions change in response to permuting feature position. We establish a game-theoretic connection between OrdShap and Sanchez-Bergantiños values, providing a theoretically grounded approach to position-sensitive attribution. Empirical results from health, natural language, and synthetic datasets highlight OrdShap's effectiveness in capturing feature value and feature position attributions, and provide deeper insight into model behavior.

Paper Structure

This paper contains 41 sections, 3 theorems, 71 equations, 9 figures, 7 tables, 2 algorithms.

Key Result

Theorem 1

Given a set of players $N = \{1,\ldots,d\}$ and a characteristic function $\tilde{\omega}:\mathcal{P}(N) \times \mathfrak{S}_N \rightarrow \mathbb{R}$, there exists a corresponding function $\omega: \Omega \rightarrow \mathbb{R}$, representing $\tilde{\omega}$ averaged over the permutations $\sigma

Figures (9)

  • Figure 1: Predicting hospital Length-of-Stay (LOS) $\geq 3$ days for a patient using a sequence of medical tokens, representing tests and medications (App. \ref{['app:ehr_background']}). Traditional feature attribution methods capture model sensitivity to token values. However, we observe that permuting token order has a significant effect on the predicted risk over time (Left) and final prediction (Right), even when token values are unchanged.
  • Figure 2: Overview of OrdShap on a medical example. (A) OrdShap takes a sequence of features (i.e. tokens) as input, then (B) evaluates the black-box model while sampling different token permutations (i.e. reordering tokens) and subsets. (C) We summarize attributions due to token value and token position using OrdShap-VI and OrdShap-PI, respectively. Positive OrdShap-VI indicates that the presence of the token in the sequence increases the probability of LOS$\geq 3$. Positive OrdShap-PI indicates that probability of LOS$\geq 3$ increases when the token appears later in the sequence; high magnitude indicates high model sensitivity to token position.
  • Figure 3: Toy example illustrating the differences between (A) Shapley Values, (B) OrdShap, (C) OrdShap-VI, and (D) OrdShap-PI. Values are calculated on the sample [Hat, Hat, Hat, Bag, R-Glove, R-Glove], with characteristic function defined in §\ref{['sec:example']}.
  • Figure 4: Evaluation of OrdShap-PI (blue) by permuting an increasing number of features and calculating the change in the predicted probability of the predicted class (higher is better). On average, permutating features according to the OrdShap-PI attributions increases or maintains the model's prediction, in contrast to existing methods. AUC calculations and error bars are provided in App. \ref{['app:auc']}.
  • Figure 5: Attributions for a synthetic dataset and model $f_{\textrm{nonlinear}}$. (A) Token values; tokens are assigned Value Importance (VI) and/or Position Importance (PI) with respect to sequence index $i$. (B) OrdShap-VI and OrdShap-PI attributions are able to separate the different tokens based on VI and PI effects. (C) In contrast, attributions from existing methods cannot distinguish between the different tokens since VI and PI effects are entangled.
  • ...and 4 more figures

Theorems & Definitions (9)

  • Definition 1
  • Theorem 1
  • Definition 2
  • Theorem 2
  • Corollary 2.1
  • proof
  • proof
  • proof
  • Remark