Table of Contents
Fetching ...

Sum-of-Parts: Self-Attributing Neural Networks with End-to-End Learning of Feature Groups

Weiqiu You, Helen Qu, Marco Gatti, Bhuvnesh Jain, Eric Wong

TL;DR

Sum-of-Parts (SOP) introduces a model-agnostic framework that converts any differentiable model into a group-based Self-Attributing Neural Network by learning feature groups end-to-end without supervision. The authors prove that per-feature SANNs incur fundamental, dimensionally exploding errors for correlated features, while group-based SANNs can achieve zero error when groups align with data correlations. SOP combines a learnable Group Generator, a Backbone Predictor, and a SparseCrossAttn-based Group Selector to produce interpretable, sparse, and faithful attributions, achieving state-of-the-art performance among SANNs on Vision and Language tasks and demonstrating practical utility in model debugging and cosmological discovery. The framework is validated across ImageNet-S, CosmoGrid, and MultiRC, with detailed analyses showing robust interpretability, semantic coherence, and domain-specific insights, including new cosmological findings about voids and clusters. Overall, SOP provides a scalable path to faithful explanations that preserve predictive performance and support real-world scientific and diagnostic applications.

Abstract

Self-attributing neural networks (SANNs) present a potential path towards interpretable models for high-dimensional problems, but often face significant trade-offs in performance. In this work, we formally prove a lower bound on errors of per-feature SANNs, whereas group-based SANNs can achieve zero error and thus high performance. Motivated by these insights, we propose Sum-of-Parts (SOP), a framework that transforms any differentiable model into a group-based SANN, where feature groups are learned end-to-end without group supervision. SOP achieves state-of-the-art performance for SANNs on vision and language tasks, and we validate that the groups are interpretable on a range of quantitative and semantic metrics. We further validate the utility of SOP explanations in model debugging and cosmological scientific discovery. Our code is available at https://github.com/BrachioLab/sop

Sum-of-Parts: Self-Attributing Neural Networks with End-to-End Learning of Feature Groups

TL;DR

Sum-of-Parts (SOP) introduces a model-agnostic framework that converts any differentiable model into a group-based Self-Attributing Neural Network by learning feature groups end-to-end without supervision. The authors prove that per-feature SANNs incur fundamental, dimensionally exploding errors for correlated features, while group-based SANNs can achieve zero error when groups align with data correlations. SOP combines a learnable Group Generator, a Backbone Predictor, and a SparseCrossAttn-based Group Selector to produce interpretable, sparse, and faithful attributions, achieving state-of-the-art performance among SANNs on Vision and Language tasks and demonstrating practical utility in model debugging and cosmological discovery. The framework is validated across ImageNet-S, CosmoGrid, and MultiRC, with detailed analyses showing robust interpretability, semantic coherence, and domain-specific insights, including new cosmological findings about voids and clusters. Overall, SOP provides a scalable path to faithful explanations that preserve predictive performance and support real-world scientific and diagnostic applications.

Abstract

Self-attributing neural networks (SANNs) present a potential path towards interpretable models for high-dimensional problems, but often face significant trade-offs in performance. In this work, we formally prove a lower bound on errors of per-feature SANNs, whereas group-based SANNs can achieve zero error and thus high performance. Motivated by these insights, we propose Sum-of-Parts (SOP), a framework that transforms any differentiable model into a group-based SANN, where feature groups are learned end-to-end without group supervision. SOP achieves state-of-the-art performance for SANNs on vision and language tasks, and we validate that the groups are interpretable on a range of quantitative and semantic metrics. We further validate the utility of SOP explanations in model debugging and cosmological scientific discovery. Our code is available at https://github.com/BrachioLab/sop
Paper Structure (77 sections, 9 theorems, 77 equations, 19 figures, 9 tables)

This paper contains 77 sections, 9 theorems, 77 equations, 19 figures, 9 tables.

Key Result

Theorem 2.3

Let $p:\{0,1\}^d\rightarrow \{0,1,2\}$ be a multilinear binomial polynomial function. Furthermore suppose that the features can be partitioned into $(S_1,S_2,S_3)$ of equal sizes where $p(x) = \prod_{i\in S_1 \cup S_2} x_i + \prod_{j\in S_2\cup S_3} x_j$. Then, $\sum_{S\subseteq[d]{}} \mathrm{InsErr

Figures (19)

  • Figure 1: Sum-of-Parts (SOP) linearly aggregates outputs from multiple feature groups. This maintains performance while ensuring interpretability. SOP first generates groups using a group generator$\Gamma$, predicts with a pre-trained backbone$h$, and aggregates the group predictions with a group selector$\theta$.
  • Figure 2: Errors for per-feature SANNs grow fast unavoidably. The minimum (a) total insertion error of monomials of size $d$ and (b) total deletion errors of binomials of size $d$ are the minima over all possible per-feature self-explaining models. The dots are the lower bounds computed by the solver, while the line is a best-fit exponential function.
  • Figure 3: (ImageNet Sparsity vs. Error $\downarrow$) We report how error increases when sparsity increases (fewer input features are included in each group), where SOP's slowest increase is the most desired.
  • Figure 4: (ImageNet Group Probing Accuracy) The powerful group generator in SOP is not doing all the work and not compromising SOP's interpretability. A CNN model trained on group masks from SOP is unable to obtain accuracies more than random (0.1% accuracy), while MFABA-F, AMPE-F, IG-F etc. do. RISE and Archipelago is omitted for the significant computational cost. Results for linear and ViT probing models are in Appendix \ref{['app:info_leak']}.
  • Figure 5: We show example groups from different SANNs for a Beagle in ImageNet, and find that SOP learns to generate groups more semantically coherent than other SANNs. "-F" indicates self-attributing models converted from post-hoc methods. The highlights show the groups selected by each method for ImageNet, with unused patches hatched-out. Each group has 20% features.
  • ...and 14 more figures

Theorems & Definitions (25)

  • Definition 2.1
  • Definition 2.2
  • Theorem 2.3: Lower Bound on Insertion Error for Binomials
  • Theorem 2.4: Informal: Zero Group Insertion and Deletion Error
  • Definition 1.1
  • Theorem 1.2: Lower Bound on Deletion Error for Monomials
  • proof
  • Conjecture 1.1: Deletion Error for Monomials Grows Exponentially with Dimension
  • Theorem 1.3: Insertion Error for Monomials
  • proof
  • ...and 15 more