Partial Order in Chaos: Consensus on Feature Attributions in the Rashomon Set
Gabriel Laberge, Yann Pequignot, Alexandre Mathieu, Foutse Khomh, Mario Marchand
TL;DR
The paper tackles the problem of post-hoc explanations under model under-specification by shifting from single-model attributions to statements that hold across the Rashomon Set of all good models, yielding partial orders over local and global feature importance. It formalizes a consensus-based framework that uses optimization over ellipsoids and combinatorial relaxations to derive trustworthy local and global attribution relations, while introducing error-tolerance controls (capture bounds and heuristics) and sensitivity analyses. The authors instantiate the framework across Additive Regression, Kernel Ridge, and Random Forests, deriving practical procedures for local and global consensus and demonstrating, on real datasets like Kaggle-Houses and Adult-Income, that partial orders provide robust, cautious interpretations even when under-specification is high. The work highlights that consensus-based partial orders can preserve informative explanations while avoiding overconfident or conflicting claims, with broad implications for interpretability in ML deployments. Overall, the approach offers a principled path to reliable explanations that respect uncertainty in model choice and data noise, potentially guiding safer decision-making in high-stakes domains.
Abstract
Post-hoc global/local feature attribution methods are progressively being employed to understand the decisions of complex machine learning models. Yet, because of limited amounts of data, it is possible to obtain a diversity of models with good empirical performance but that provide very different explanations for the same prediction, making it hard to derive insight from them. In this work, instead of aiming at reducing the under-specification of model explanations, we fully embrace it and extract logical statements about feature attributions that are consistent across all models with good empirical performance (i.e. all models in the Rashomon Set). We show that partial orders of local/global feature importance arise from this methodology enabling more nuanced interpretations by allowing pairs of features to be incomparable when there is no consensus on their relative importance. We prove that every relation among features present in these partial orders also holds in the rankings provided by existing approaches. Finally, we present three use cases employing hypothesis spaces with tractable Rashomon Sets (Additive models, Kernel Ridge, and Random Forests) and show that partial orders allow one to extract consistent local and global interpretations of models despite their under-specification.
