Table of Contents
Fetching ...

Always Keep Your Promises: DynamicLRP, A Model-Agnostic Solution To Layer-Wise Relevance Propagation

Kevin Lee, Pablo Millan Arias

TL;DR

LRP has struggled with generality across evolving architectures due to module-level rules. DynamicLRP advances explainability by performing operation-level relevance propagation on computation graphs and introducing the Promise System to retrieve missing activations without modifying models, achieving architecture-agnostic LRP with theoretical guarantees. Empirically, DynamicLRP attains faithful attributions across vision, language, and audio tasks and achieves near-complete graph coverage while maintaining practical efficiency. This framework lays a sustainable foundation for LRP across diverse architectures and deep-learning libraries.

Abstract

Layer-wise Relevance Propagation (LRP) provides principled attribution for neural networks through conservation properties and foundations in Deep Taylor Decomposition. However, existing implementations operate at the module level, requiring architecture-specific propagation rules and model modifications. These limit the generality of target model and sustainability of implementations as architectures evolve. We introduce DynamicLRP, a model-agnostic LRP framework operating at the tensor operation level. By decomposing attribution to individual operations within computation graphs and introducing a novel mechanism for deferred activation resolution, named the Promise System, our approach achieves true architecture agnosticity while maintaining LRP's theoretical guarantees. This design operates independently of backpropagation machinery, requiring no model modification, enabling side-by-side execution with gradient backpropagation. Being based on computation graphs, this method is theoretically extensible to other deep learning libraries that support auto-differentiation. We demonstrate faithfulness matching or exceeding specialized implementations (1.77 vs 1.69 ABPC on VGG, equivalent performance on ViT, 93.70% and 95.06% top-1 attribution accuracy for explaining RoBERTa-large and Flan-T5-large answers on SQuADv2, respectively) while maintaining practical efficiency on models with 100M-1B parameters. We achieved 99.92% node coverage across 31,465 computation graph nodes from 15 diverse architectures, including state-space models (Mamba), audio transformers (Whisper), and multimodal systems (DePlot) without any model-specific code with rules for 47 fundamental operations implemented. Our operation-level decomposition and Promise System establish a sustainable, extensible foundation for LRP across evolving architectures. All code is available at https://github.com/keeinlev/dynamicLRP .

Always Keep Your Promises: DynamicLRP, A Model-Agnostic Solution To Layer-Wise Relevance Propagation

TL;DR

LRP has struggled with generality across evolving architectures due to module-level rules. DynamicLRP advances explainability by performing operation-level relevance propagation on computation graphs and introducing the Promise System to retrieve missing activations without modifying models, achieving architecture-agnostic LRP with theoretical guarantees. Empirically, DynamicLRP attains faithful attributions across vision, language, and audio tasks and achieves near-complete graph coverage while maintaining practical efficiency. This framework lays a sustainable foundation for LRP across diverse architectures and deep-learning libraries.

Abstract

Layer-wise Relevance Propagation (LRP) provides principled attribution for neural networks through conservation properties and foundations in Deep Taylor Decomposition. However, existing implementations operate at the module level, requiring architecture-specific propagation rules and model modifications. These limit the generality of target model and sustainability of implementations as architectures evolve. We introduce DynamicLRP, a model-agnostic LRP framework operating at the tensor operation level. By decomposing attribution to individual operations within computation graphs and introducing a novel mechanism for deferred activation resolution, named the Promise System, our approach achieves true architecture agnosticity while maintaining LRP's theoretical guarantees. This design operates independently of backpropagation machinery, requiring no model modification, enabling side-by-side execution with gradient backpropagation. Being based on computation graphs, this method is theoretically extensible to other deep learning libraries that support auto-differentiation. We demonstrate faithfulness matching or exceeding specialized implementations (1.77 vs 1.69 ABPC on VGG, equivalent performance on ViT, 93.70% and 95.06% top-1 attribution accuracy for explaining RoBERTa-large and Flan-T5-large answers on SQuADv2, respectively) while maintaining practical efficiency on models with 100M-1B parameters. We achieved 99.92% node coverage across 31,465 computation graph nodes from 15 diverse architectures, including state-space models (Mamba), audio transformers (Whisper), and multimodal systems (DePlot) without any model-specific code with rules for 47 fundamental operations implemented. Our operation-level decomposition and Promise System establish a sustainable, extensible foundation for LRP across evolving architectures. All code is available at https://github.com/keeinlev/dynamicLRP .

Paper Structure

This paper contains 37 sections, 1 theorem, 12 equations, 6 figures, 6 tables, 4 algorithms.

Key Result

Theorem 1

Let $G = (V, E)$ be the computation graph of a neural network with $n = |V|$ operations, $m = |E|$ edges, promise-generating set $V_P$, and maximum promise depth $D$. Let $C_{fwd}, C_{bwd}$ be the most expensive forward and backward pass computation steps, respectively. Let $S$ be the size of the la

Figures (6)

  • Figure 1: ImageNette-320 example
  • Figure 2: Zennit LRP Attributions
  • Figure 3: Our LRP Attributions
  • Figure 4: Visualizing the difference between a module-level and operation-level approach for a simplified Transformer block. The rightmost diagram represents some fabricated toy example architecture that shares the same operation set as the Transformer, and thus does not require any new implementation to be covered by operation-level LRP, but would need new rules and configurations under module-level LRP.
  • Figure 5: Comparing methods for VGG attributions.
  • ...and 1 more figures

Theorems & Definitions (10)

  • Definition 1: Traversal Heuristic
  • Definition 2: Promise
  • Definition 3: Promise-Generating Operations
  • Definition 4: Promise Depth
  • Theorem 1: Promise-Based LRP Complexity
  • proof
  • Definition 5: Promise Density
  • proof
  • proof
  • proof