Table of Contents
Fetching ...

EXAGREE: Mitigating Explanation Disagreement with Stakeholder-Aligned Models

Sichao Li, Tommy Liu, Quanling Deng, Amanda S. Barnard

TL;DR

EXAGREE introduces a stakeholder-centered approach to explanation disagreement by operating inside a Rashomon set of near-optimal models and selecting Stakeholder-Aligned Explanation Models (SAEMs) that maximize Stakeholder-Machine Agreement ($ ext{SMA}$). The framework combines a differentiable mask-based attribution network (DMAN) with differentiable sorting (DiffSortNet) and a multi-head mask network (MHMN) to explore diverse explanations while respecting a performance constraint. Empirical results on six real-world OpenXAI datasets show gains in faithfulness ($ ext{A}_{ ext{faith}}$) and plausibility ($ ext{A}_{ ext{plaus}}$), improved SMA, and reduced subgroup fairness gaps, with an LLM-assisted interface enabling natural-language stakeholder feedback. By turning explanation disagreement into a selection problem, EXAGREE provides a principled, practical path toward stakeholder-centered XAI in safety-critical domains.

Abstract

Conflicting explanations, arising from different attribution methods or model internals, limit the adoption of machine learning models in safety-critical domains. We turn this disagreement into an advantage and introduce EXplanation AGREEment (EXAGREE), a two-stage framework that selects a Stakeholder-Aligned Explanation Model (SAEM) from a set of similar-performing models. The selection maximizes Stakeholder-Machine Agreement (SMA), a single metric that unifies faithfulness and plausibility. EXAGREE couples a differentiable mask-based attribution network (DMAN) with monotone differentiable sorting, enabling gradient-based search inside the constrained model space. Experiments on six real-world datasets demonstrate simultaneous gains of faithfulness, plausibility, and fairness over baselines, while preserving task accuracy. Extensive ablation studies, significance tests, and case studies confirm the robustness and feasibility of the method in practice.

EXAGREE: Mitigating Explanation Disagreement with Stakeholder-Aligned Models

TL;DR

EXAGREE introduces a stakeholder-centered approach to explanation disagreement by operating inside a Rashomon set of near-optimal models and selecting Stakeholder-Aligned Explanation Models (SAEMs) that maximize Stakeholder-Machine Agreement (). The framework combines a differentiable mask-based attribution network (DMAN) with differentiable sorting (DiffSortNet) and a multi-head mask network (MHMN) to explore diverse explanations while respecting a performance constraint. Empirical results on six real-world OpenXAI datasets show gains in faithfulness () and plausibility (), improved SMA, and reduced subgroup fairness gaps, with an LLM-assisted interface enabling natural-language stakeholder feedback. By turning explanation disagreement into a selection problem, EXAGREE provides a principled, practical path toward stakeholder-centered XAI in safety-critical domains.

Abstract

Conflicting explanations, arising from different attribution methods or model internals, limit the adoption of machine learning models in safety-critical domains. We turn this disagreement into an advantage and introduce EXplanation AGREEment (EXAGREE), a two-stage framework that selects a Stakeholder-Aligned Explanation Model (SAEM) from a set of similar-performing models. The selection maximizes Stakeholder-Machine Agreement (SMA), a single metric that unifies faithfulness and plausibility. EXAGREE couples a differentiable mask-based attribution network (DMAN) with monotone differentiable sorting, enabling gradient-based search inside the constrained model space. Experiments on six real-world datasets demonstrate simultaneous gains of faithfulness, plausibility, and fairness over baselines, while preserving task accuracy. Extensive ablation studies, significance tests, and case studies confirm the robustness and feasibility of the method in practice.

Paper Structure

This paper contains 30 sections, 11 theorems, 32 equations, 13 figures, 13 tables.

Key Result

Lemma 2.3

If $\mathcal{A}_{\text{SMA}}\!<\!1$, no delivered explanation can simultaneously maximize $\mathcal{A}_{\text{faith}}$ and $\mathcal{A}_{\text{plaus}}$ (proof see Appendix Lemma 3.1).

Figures (13)

  • Figure 1: An illustration showing that an explanation relying on a single ML model cannot satisfy all stakeholders.
  • Figure 2: Pareto frontier of faithfulness and plausibility in practice with $\mathcal{A}_{\text{SMA}} > 0$, $\mathcal{A}_{\text{SMA}} = 0$, and $\mathcal{A}_{\text{SMA}} < 0$, indicating the strength of trade-off.
  • Figure 3: Overview of EXAGREE framework, illustrating the two-stage processes from top-left to bottom-right. Stage 1: Exploring Rashomon Set and Attribution Mapping; Stage 2: Identification of SAEMs under ranking supervision.
  • Figure 4: An illustration of feature attribution distributions from a single model (left) and a model space (right), where the ordering $a_{(1)} \succ a_{(2)}$ can never occur.
  • Figure 5: Performance of sampled models (gray) across all datasets from two canonical backbones: ANN (blue) and LR (red). The vertical dashed line denotes the log-loss threshold $\epsilon$ = 0.05 used to define the Rashomon set.
  • ...and 8 more figures

Theorems & Definitions (22)

  • Definition 2.1: Delivered Explanation
  • Definition 2.2: Stakeholder-Machine Agreement
  • Lemma 2.3: Faithfulness and Plausibility Trade‐Off
  • Lemma 2.4: Trade-Off Tension
  • Remark 2.5
  • Lemma 3.1
  • proof
  • Proposition 3.2
  • Lemma 3.3
  • Lemma C.1: Faithfulness and Plausibility Trade‐Off
  • ...and 12 more