Table of Contents
Fetching ...

Community Detection on Model Explanation Graphs for Explainable AI

Ehsan Moradi

TL;DR

This work introduces Modules of Influence (MoI), a graph-based framework that converts per-instance feature attributions into a feature–feature co-influence graph and then applies community detection to discover Modules of Influence among features. MoI provides module-level auditing metrics—synergy, redundancy, bias exposure, and stability—and an evaluation protocol on synthetic and real tabular data, enabling targeted interventions, fairness auditing, and model compression without sacrificing predictive performance. By shifting focus from individual feature importance to meso-scale communities, MoI enables more stable debugging, interpretable governance actions, and actionable insights for bias localization and data collection priorities. The approach is attribution-agnostic, scalable, and designed with responsible release in mind, offering visualization and reporting tools that translate complex graph structures into decision-ready dashboards for practitioners and policymakers.

Abstract

Feature-attribution methods (e.g., SHAP, LIME) explain individual predictions but often miss higher-order structure: sets of features that act in concert. We propose Modules of Influence (MoI), a framework that (i) constructs a model explanation graph from per-instance attributions, (ii) applies community detection to find feature modules that jointly affect predictions, and (iii) quantifies how these modules relate to bias, redundancy, and causality patterns. Across synthetic and real datasets, MoI uncovers correlated feature groups, improves model debugging via module-level ablations, and localizes bias exposure to specific modules. We release stability and synergy metrics, a reference implementation, and evaluation protocols to benchmark module discovery in XAI.

Community Detection on Model Explanation Graphs for Explainable AI

TL;DR

This work introduces Modules of Influence (MoI), a graph-based framework that converts per-instance feature attributions into a feature–feature co-influence graph and then applies community detection to discover Modules of Influence among features. MoI provides module-level auditing metrics—synergy, redundancy, bias exposure, and stability—and an evaluation protocol on synthetic and real tabular data, enabling targeted interventions, fairness auditing, and model compression without sacrificing predictive performance. By shifting focus from individual feature importance to meso-scale communities, MoI enables more stable debugging, interpretable governance actions, and actionable insights for bias localization and data collection priorities. The approach is attribution-agnostic, scalable, and designed with responsible release in mind, offering visualization and reporting tools that translate complex graph structures into decision-ready dashboards for practitioners and policymakers.

Abstract

Feature-attribution methods (e.g., SHAP, LIME) explain individual predictions but often miss higher-order structure: sets of features that act in concert. We propose Modules of Influence (MoI), a framework that (i) constructs a model explanation graph from per-instance attributions, (ii) applies community detection to find feature modules that jointly affect predictions, and (iii) quantifies how these modules relate to bias, redundancy, and causality patterns. Across synthetic and real datasets, MoI uncovers correlated feature groups, improves model debugging via module-level ablations, and localizes bias exposure to specific modules. We release stability and synergy metrics, a reference implementation, and evaluation protocols to benchmark module discovery in XAI.

Paper Structure

This paper contains 77 sections, 7 equations, 8 figures, 4 tables, 3 algorithms.

Figures (8)

  • Figure 1: Explanation graph colored by discovered modules (left); reordered affinity matrix $W$ (right). Coherent blocks indicate domain-aligned communities.
  • Figure 2: Fairness dashboard: $\mathrm{BEI}$ per module with CIs (left); disparity before/after attenuating top-$\mathrm{BEI}$ modules (right).
  • Figure 3: Stability–utility trade-off: $\mathrm{MSI}$ vs. variance of ablation drops (left); $\mathrm{MSI}$ across edge rules (right).
  • Figure 4: Explanation graph colored by modules; edge width $\propto |W_{ij}|$. Dashed edges indicate negative correlations (signed view).
  • Figure 5: Sankey: feature$\rightarrow$module$\rightarrow$output contributions (magnitude view); class-conditional variant in inset.
  • ...and 3 more figures