Community Detection on Model Explanation Graphs for Explainable AI
Ehsan Moradi
TL;DR
This work introduces Modules of Influence (MoI), a graph-based framework that converts per-instance feature attributions into a feature–feature co-influence graph and then applies community detection to discover Modules of Influence among features. MoI provides module-level auditing metrics—synergy, redundancy, bias exposure, and stability—and an evaluation protocol on synthetic and real tabular data, enabling targeted interventions, fairness auditing, and model compression without sacrificing predictive performance. By shifting focus from individual feature importance to meso-scale communities, MoI enables more stable debugging, interpretable governance actions, and actionable insights for bias localization and data collection priorities. The approach is attribution-agnostic, scalable, and designed with responsible release in mind, offering visualization and reporting tools that translate complex graph structures into decision-ready dashboards for practitioners and policymakers.
Abstract
Feature-attribution methods (e.g., SHAP, LIME) explain individual predictions but often miss higher-order structure: sets of features that act in concert. We propose Modules of Influence (MoI), a framework that (i) constructs a model explanation graph from per-instance attributions, (ii) applies community detection to find feature modules that jointly affect predictions, and (iii) quantifies how these modules relate to bias, redundancy, and causality patterns. Across synthetic and real datasets, MoI uncovers correlated feature groups, improves model debugging via module-level ablations, and localizes bias exposure to specific modules. We release stability and synergy metrics, a reference implementation, and evaluation protocols to benchmark module discovery in XAI.
