OrdShap: Feature Position Importance for Sequential Black-Box Models
Davin Hill, Brian L. Hill, Aria Masoomi, Vijay S. Nori, Robert E. Tillman, Jennifer Dy
TL;DR
OrdShap targets the long-standing challenge of attributing predictions from sequential models to both token values and their positions. It builds a Shapley-based attribution with positional conditioning via a generalized characteristic function $ ilde{oldsymbol{ ilde{ u}}}$, producing a $d\times d$ attribution matrix $oldsymbol{\gamma}$ and two concise summaries: OrdShap-VI for value importance and OrdShap-PI for position importance. The authors establish a theoretical link between OrdShap-VI and the Sanchez-Bergantiños value, and provide two practical approximation schemes—Sampling and Least-Squares—to make the method scalable. Empirically, OrdShap demonstrates improved disentanglement of value and position across health, natural language, and synthetic data, offering more nuanced explanations for sequential black-box models with potential impact in high-stakes domains like healthcare.
Abstract
Sequential deep learning models excel in domains with temporal or sequential dependencies, but their complexity necessitates post-hoc feature attribution methods for understanding their predictions. While existing techniques quantify feature importance, they inherently assume fixed feature ordering - conflating the effects of (1) feature values and (2) their positions within input sequences. To address this gap, we introduce OrdShap, a novel attribution method that disentangles these effects by quantifying how a model's predictions change in response to permuting feature position. We establish a game-theoretic connection between OrdShap and Sanchez-Bergantiños values, providing a theoretically grounded approach to position-sensitive attribution. Empirical results from health, natural language, and synthetic datasets highlight OrdShap's effectiveness in capturing feature value and feature position attributions, and provide deeper insight into model behavior.
