Higher-Order Feature Attribution: Bridging Statistics, Explainable AI, and Topological Signal Processing
Kurt Butler, Guanchao Feng, Petar Djuric
TL;DR
This work tackles the challenge of attributing predictions when input features interact nonlinearly. It builds a general theory of higher-order feature attribution by composing Integrated Gradients operators, defining second-order attributions $a_{ij}(oldsymbol{x}) = A_i A_j f(oldsymbol{x})$ and extending to higher orders, while preserving symmetry and marginalization properties. The framework connects to topological signal processing by viewing explanations as graph- or simplicial-complex signals with tensor representations, and it recovers established results such as the Integrated Hessians in the second-order case. Empirical results on synthetic data and a real estate task demonstrate the method's ability to recover ground-truth interaction structures and reveal joint feature effects, offering interpretable insights for complex models.
Abstract
Feature attributions are post-training analysis methods that assess how various input features of a machine learning model contribute to an output prediction. Their interpretation is straightforward when features act independently, but becomes less direct when the predictive model involves interactions such as multiplicative relationships or joint feature contributions. In this work, we propose a general theory of higher-order feature attribution, which we develop on the foundation of Integrated Gradients (IG). This work extends existing frameworks in the literature on explainable AI. When using IG as the method of feature attribution, we discover natural connections to statistics and topological signal processing. We provide several theoretical results that establish the theory, and we validate our theory on a few examples.
