Table of Contents
Fetching ...

DLBacktrace: A Model Agnostic Explainability for any Deep Learning Models

Vinay Kumar Sankarapu, Chintan Chitroda, Yashwardhan Rathore, Neeraj Kumar Singh, Pratinav Seth

TL;DR

This work tackles the opacity of deep learning by introducing DLBacktrace, a model-agnostic explainability method that traces relevance from output back to input across CNNs, transformers, and other architectures. It presents two modes, Default and Contrastive, and extends relevance propagation to attention layers, delivering deterministic, architecture-agnostic explanations with an accompanying PyTorch/TensorFlow library. Through benchmarking on tabular (Lending Club), image (CIFAR-10 with ResNet-34), and text (SST-2 with BERT) tasks, DLBacktrace demonstrates robust, informative explanations that improve interpretability and support trust and regulatory compliance, albeit with higher computational cost in some settings. The paper discusses practical implications for network analysis, feature importance, and fairness, and points to future work in scaling to complex transformers, multimodal models, and real-time deployment.

Abstract

The rapid growth of AI has led to more complex deep learning models, often operating as opaque "black boxes" with limited transparency in their decision-making. This lack of interpretability poses challenges, especially in high-stakes applications where understanding model output is crucial. This work highlights the importance of interpretability in fostering trust, accountability, and responsible deployment. To address these challenges, we introduce DLBacktrace, a novel, model-agnostic technique designed to provide clear insights into deep learning model decisions across a wide range of domains and architectures, including MLPs, CNNs, and Transformer-based LLM models. We present a comprehensive overview of DLBacktrace and benchmark its performance against established interpretability methods such as SHAP, LIME, and GradCAM. Our results demonstrate that DLBacktrace effectively enhances understanding of model behavior across diverse tasks. DLBacktrace is compatible with models developed in both PyTorch and TensorFlow, supporting architectures such as BERT, ResNet, U-Net, and custom DNNs for tabular data. The library is open-sourced and available at https://github.com/AryaXAI/DLBacktrace .

DLBacktrace: A Model Agnostic Explainability for any Deep Learning Models

TL;DR

This work tackles the opacity of deep learning by introducing DLBacktrace, a model-agnostic explainability method that traces relevance from output back to input across CNNs, transformers, and other architectures. It presents two modes, Default and Contrastive, and extends relevance propagation to attention layers, delivering deterministic, architecture-agnostic explanations with an accompanying PyTorch/TensorFlow library. Through benchmarking on tabular (Lending Club), image (CIFAR-10 with ResNet-34), and text (SST-2 with BERT) tasks, DLBacktrace demonstrates robust, informative explanations that improve interpretability and support trust and regulatory compliance, albeit with higher computational cost in some settings. The paper discusses practical implications for network analysis, feature importance, and fairness, and points to future work in scaling to complex transformers, multimodal models, and real-time deployment.

Abstract

The rapid growth of AI has led to more complex deep learning models, often operating as opaque "black boxes" with limited transparency in their decision-making. This lack of interpretability poses challenges, especially in high-stakes applications where understanding model output is crucial. This work highlights the importance of interpretability in fostering trust, accountability, and responsible deployment. To address these challenges, we introduce DLBacktrace, a novel, model-agnostic technique designed to provide clear insights into deep learning model decisions across a wide range of domains and architectures, including MLPs, CNNs, and Transformer-based LLM models. We present a comprehensive overview of DLBacktrace and benchmark its performance against established interpretability methods such as SHAP, LIME, and GradCAM. Our results demonstrate that DLBacktrace effectively enhances understanding of model behavior across diverse tasks. DLBacktrace is compatible with models developed in both PyTorch and TensorFlow, supporting architectures such as BERT, ResNet, U-Net, and custom DNNs for tabular data. The library is open-sourced and available at https://github.com/AryaXAI/DLBacktrace .

Paper Structure

This paper contains 54 sections, 11 equations, 18 figures, 3 tables, 3 algorithms.

Figures (18)

  • Figure 1: DLBacktrace Workflow: Generating Fine-Grained Explanations from Pre-Trained Models and Test Time Instance.
  • Figure 2: Illustration Depicting DLBacktrace Calculation for a Sample Network
  • Figure 3: Illustration of Explanations of a Correctly Classified Sample from the Lending Club Dataset where Loan was Fully Paid and was predicted by MLP as Fully Paid.
  • Figure 4: Visualizing ResNet's decisions on a Truck image of CIFAR10 Dataset using various explanation methods.
  • Figure 5: Explanations by different methods for model decision making for Sentiment Analysis for a sample from SST Dataset. Input Text: The emotions are raw and will strike a nerve with anyone who ever had family trauma. Prediction: 1 and Label: 1
  • ...and 13 more figures