Table of Contents
Fetching ...

Less is More: Efficient Black-box Attribution via Minimal Interpretable Subset Selection

Ruoyu Chen, Siyuan Liang, Jingzhi Li, Shiming Liu, Li Liu, Hua Zhang, Xiaochun Cao

TL;DR

LiMA tackles faithful input-region attribution for black-box models by treating region selection as a submodular subset selection problem. It introduces a novel four-score submodular objective that balances semantic consistency, collaborative effects, prediction confidence, and regional diversity, and implements a bidirectional greedy search to efficiently identify the most and least important regions. Across eight foundation models and six datasets spanning images, audio, and medical domains, LiMA achieves state-of-the-art fidelity (Deletion/Insertion AUC and $\mu$Fidelity) while using substantially fewer regions and running faster than naive approaches. The method also proves effective for diagnosing incorrect predictions, revealing error-causing regions with high confidence, and demonstrates robust generalization and scalability as model size and modality complexity increase.

Abstract

To develop a trustworthy AI system, which aim to identify the input regions that most influence the models decisions. The primary task of existing attribution methods lies in efficiently and accurately identifying the relationships among input-prediction interactions. Particularly when the input data is discrete, such as images, analyzing the relationship between inputs and outputs poses a significant challenge due to the combinatorial explosion. In this paper, we propose a novel and efficient black-box attribution mechanism, LiMA (Less input is More faithful for Attribution), which reformulates the attribution of important regions as an optimization problem for submodular subset selection. First, to accurately assess interactions, we design a submodular function that quantifies subset importance and effectively captures their impact on decision outcomes. Then, efficiently ranking input sub-regions by their importance for attribution, we improve optimization efficiency through a novel bidirectional greedy search algorithm. LiMA identifies both the most and least important samples while ensuring an optimal attribution boundary that minimizes errors. Extensive experiments on eight foundation models demonstrate that our method provides faithful interpretations with fewer regions and exhibits strong generalization, shows an average improvement of 36.3% in Insertion and 39.6% in Deletion. Our method also outperforms the naive greedy search in attribution efficiency, being 1.6 times faster. Furthermore, when explaining the reasons behind model prediction errors, the average highest confidence achieved by our method is, on average, 86.1% higher than that of state-of-the-art attribution algorithms. The code is available at https://github.com/RuoyuChen10/LIMA.

Less is More: Efficient Black-box Attribution via Minimal Interpretable Subset Selection

TL;DR

LiMA tackles faithful input-region attribution for black-box models by treating region selection as a submodular subset selection problem. It introduces a novel four-score submodular objective that balances semantic consistency, collaborative effects, prediction confidence, and regional diversity, and implements a bidirectional greedy search to efficiently identify the most and least important regions. Across eight foundation models and six datasets spanning images, audio, and medical domains, LiMA achieves state-of-the-art fidelity (Deletion/Insertion AUC and Fidelity) while using substantially fewer regions and running faster than naive approaches. The method also proves effective for diagnosing incorrect predictions, revealing error-causing regions with high confidence, and demonstrates robust generalization and scalability as model size and modality complexity increase.

Abstract

To develop a trustworthy AI system, which aim to identify the input regions that most influence the models decisions. The primary task of existing attribution methods lies in efficiently and accurately identifying the relationships among input-prediction interactions. Particularly when the input data is discrete, such as images, analyzing the relationship between inputs and outputs poses a significant challenge due to the combinatorial explosion. In this paper, we propose a novel and efficient black-box attribution mechanism, LiMA (Less input is More faithful for Attribution), which reformulates the attribution of important regions as an optimization problem for submodular subset selection. First, to accurately assess interactions, we design a submodular function that quantifies subset importance and effectively captures their impact on decision outcomes. Then, efficiently ranking input sub-regions by their importance for attribution, we improve optimization efficiency through a novel bidirectional greedy search algorithm. LiMA identifies both the most and least important samples while ensuring an optimal attribution boundary that minimizes errors. Extensive experiments on eight foundation models demonstrate that our method provides faithful interpretations with fewer regions and exhibits strong generalization, shows an average improvement of 36.3% in Insertion and 39.6% in Deletion. Our method also outperforms the naive greedy search in attribution efficiency, being 1.6 times faster. Furthermore, when explaining the reasons behind model prediction errors, the average highest confidence achieved by our method is, on average, 86.1% higher than that of state-of-the-art attribution algorithms. The code is available at https://github.com/RuoyuChen10/LIMA.

Paper Structure

This paper contains 44 sections, 3 theorems, 38 equations, 17 figures, 18 tables, 2 algorithms.

Key Result

Lemma 1

Consider two sub-sets $S_{a}$ and $S_{b}$ in set $V$, where $S_{a} \subseteq S_{b} \subseteq V$. Given an element $\alpha$, where $\alpha \in V \setminus S_{b}$. Assuming that $\alpha$ is contributing to model interpretation, then, the function $\mathcal{F}(\cdot)$ in Eq. submodular_function is a su

Figures (17)

  • Figure 1: The left panel illustrates the Insertion and Average Highest Confidence metrics for various attribution mechanisms when attributing the model’s correct and incorrect predictions. Our method shows significant improvements across different datasets and models. The right panel shows the attribution maps of different methods, where our approach avoids noise and unnecessary region redundancy.
  • Figure 2: The framework of the proposed LiMA method. We begin by performing semantic sub-region division on the image, either using superpixel-based methods or the Segment Anything algorithm. Next, we apply a bidirectional greedy search algorithm along with a designed submodular function to simultaneously identify the most and least important samples, ranking these sub-regions accordingly. Finally, based on sub-region rankings, we concatenate the most important sample set with the least important sample set and evaluate the importance of each sub-region using consistency and collaboration scores, resulting in enhanced regional visualization. Through the faithfulness metric, our method identifies few regional representations sufficient to activate the model response.
  • Figure 3: Statistics of strong interactive response times. A. Impact of pre-training scale and model size. B. Impact of whether the model correctly predicts.
  • Figure 4: Visual explanations of the CLIP model using various attribution mechanisms, with our approach effectively reducing noise and eliminating redundant regions, leading to more interpretable attribution results.
  • Figure 5: Attribution visualizations of decision results for different multimodal foundation models on the ImageNet dataset. The first row shows the interpretation results from the state-of-the-art baseline attribution methods, while the second row displays the interpretation results from our method. Each interpretation result includes the saliency map, the highest confidence score, and its corresponding region, as well as the Insertion AUC curve. The red line in the curve represents the highest confidence of the model’s response during the search.
  • ...and 12 more figures

Theorems & Definitions (11)

  • Definition 1: Submodular function edmonds1970submodular
  • Lemma 1: Diminishing returns
  • proof
  • Lemma 2: Monotonically non-decreasing
  • proof
  • Remark 1
  • Theorem 1: Bidirectional greedy search optimality bound
  • proof
  • proof
  • proof
  • ...and 1 more