Table of Contents
Fetching ...

Explaining Deep Neural Networks by Leveraging Intrinsic Methods

Biagio La Rosa

TL;DR

This work tackles the opacity of deep neural networks by advancing intrinsic explainability through self-explainable architectures, memory-augmented modules, and latent-space analyses. It introduces Prototype-based Interpretable Graph Neural Networks (PIGNN), memory-tracking mechanisms in Simplified Differentiable Neural Computers (SDNC) and Memory Wrap, Graph Concept Whitening (GCW), and Clustered Compositional Explanations (CoEx), each designed to preserve or enhance predictive performance while delivering faithful explanations. Additionally, it surveys the integration of explanations into visual analytics systems to bring explanations to end-users and researchers alike. Collectively, the contributions yield interpretable, faithful, and actionable insights for developers, researchers, and practitioners, enabling debugging, bias detection, and trustworthy AI deployment across graph, sequential, and vision tasks. The work emphasizes a practical balance between interpretability and performance, providing quantitative and qualitative evidence that intrinsic explanations can rival post-hoc methods while offering richer, more usable explanations for real-world use cases, including drug discovery and CV applications. $f_{FE}$ and $f_{CL}$ are used to denote feature extractors and classifiers, prototypes $P$, memory $M$, and concept axes $c_j$, with equations and design choices grounded in $IoU$, fidelity metrics, and memory-tracking mappings that connect internal states to human-interpretable explanations.

Abstract

Despite their impact on the society, deep neural networks are often regarded as black-box models due to their intricate structures and the absence of explanations for their decisions. This opacity poses a significant challenge to AI systems wider adoption and trustworthiness. This thesis addresses this issue by contributing to the field of eXplainable AI, focusing on enhancing the interpretability of deep neural networks. The core contributions lie in introducing novel techniques aimed at making these networks more interpretable by leveraging an analysis of their inner workings. Specifically, the contributions are threefold. Firstly, the thesis introduces designs for self-explanatory deep neural networks, such as the integration of external memory for interpretability purposes and the usage of prototype and constraint-based layers across several domains. Secondly, this research delves into novel investigations on neurons within trained deep neural networks, shedding light on overlooked phenomena related to their activation values. Lastly, the thesis conducts an analysis of the application of explanatory techniques in the field of visual analytics, exploring the maturity of their adoption and the potential of these systems to convey explanations to users effectively.

Explaining Deep Neural Networks by Leveraging Intrinsic Methods

TL;DR

This work tackles the opacity of deep neural networks by advancing intrinsic explainability through self-explainable architectures, memory-augmented modules, and latent-space analyses. It introduces Prototype-based Interpretable Graph Neural Networks (PIGNN), memory-tracking mechanisms in Simplified Differentiable Neural Computers (SDNC) and Memory Wrap, Graph Concept Whitening (GCW), and Clustered Compositional Explanations (CoEx), each designed to preserve or enhance predictive performance while delivering faithful explanations. Additionally, it surveys the integration of explanations into visual analytics systems to bring explanations to end-users and researchers alike. Collectively, the contributions yield interpretable, faithful, and actionable insights for developers, researchers, and practitioners, enabling debugging, bias detection, and trustworthy AI deployment across graph, sequential, and vision tasks. The work emphasizes a practical balance between interpretability and performance, providing quantitative and qualitative evidence that intrinsic explanations can rival post-hoc methods while offering richer, more usable explanations for real-world use cases, including drug discovery and CV applications. and are used to denote feature extractors and classifiers, prototypes , memory , and concept axes , with equations and design choices grounded in , fidelity metrics, and memory-tracking mappings that connect internal states to human-interpretable explanations.

Abstract

Despite their impact on the society, deep neural networks are often regarded as black-box models due to their intricate structures and the absence of explanations for their decisions. This opacity poses a significant challenge to AI systems wider adoption and trustworthiness. This thesis addresses this issue by contributing to the field of eXplainable AI, focusing on enhancing the interpretability of deep neural networks. The core contributions lie in introducing novel techniques aimed at making these networks more interpretable by leveraging an analysis of their inner workings. Specifically, the contributions are threefold. Firstly, the thesis introduces designs for self-explanatory deep neural networks, such as the integration of external memory for interpretability purposes and the usage of prototype and constraint-based layers across several domains. Secondly, this research delves into novel investigations on neurons within trained deep neural networks, shedding light on overlooked phenomena related to their activation values. Lastly, the thesis conducts an analysis of the application of explanatory techniques in the field of visual analytics, exploring the maturity of their adoption and the potential of these systems to convey explanations to users effectively.
Paper Structure (131 sections, 95 equations, 45 figures, 33 tables)

This paper contains 131 sections, 95 equations, 45 figures, 33 tables.

Figures (45)

  • Figure 1: An example of a 3-layer neural network. Each node in the hidden layers is connected with all the nodes of the previous layer and all the nodes of the next layer.
  • Figure 2: A simplified version of an artificial neuron.
  • Figure 3: An unfolded view of an LSTM cell.
  • Figure 4: The architecture of PIGNN when the task is graph classification.
  • Figure 5: Avg. Fidelity and standard deviation achieved by the intrinsic method, DeepLIFT, and GNNExplainer on GCN (a) and GAT(b) models across several datasets.
  • ...and 40 more figures

Theorems & Definitions (12)

  • Definition 2.2.1
  • Definition 2.2.2
  • Definition 2.2.3
  • Definition 2.2.4
  • Definition 2.2.5
  • Definition 2.2.6
  • Definition 2.2.7
  • Definition 2.2.8
  • Definition 2.2.9
  • Definition 2.2.10
  • ...and 2 more