Table of Contents
Fetching ...

Explaining Representation by Mutual Information

Lifeng Gu

TL;DR

A mutual information (MI)-based method that decomposes neural network representations into three exhaustive components: total mutual information, decision-related information, and redundant information is proposed, which captures the entire input-representation relationship.

Abstract

As interpretability gains attention in machine learning, there is a growing need for reliable models that fully explain representation content. We propose a mutual information (MI)-based method that decomposes neural network representations into three exhaustive components: total mutual information, decision-related information, and redundant information. This theoretically complete framework captures the entire input-representation relationship, surpassing partial explanations like those from Grad-CAM. Using two lightweight modules integrated into architectures such as CNNs and Transformers,we estimate these components and demonstrate their interpretive power through visualizations on ResNet and prototype network applied to image classification and few-shot learning tasks. Our approach is distinguished by three key features: 1. Rooted in mutual information theory, it delivers a thorough and theoretically grounded interpretation, surpassing the scope of existing interpretability methods. 2. Unlike conventional methods that focus on explaining decisions, our approach centers on interpreting representations. 3. It seamlessly integrates into pre-existing network architectures, requiring only fine-tuning of the inserted modules.

Explaining Representation by Mutual Information

TL;DR

A mutual information (MI)-based method that decomposes neural network representations into three exhaustive components: total mutual information, decision-related information, and redundant information is proposed, which captures the entire input-representation relationship.

Abstract

As interpretability gains attention in machine learning, there is a growing need for reliable models that fully explain representation content. We propose a mutual information (MI)-based method that decomposes neural network representations into three exhaustive components: total mutual information, decision-related information, and redundant information. This theoretically complete framework captures the entire input-representation relationship, surpassing partial explanations like those from Grad-CAM. Using two lightweight modules integrated into architectures such as CNNs and Transformers,we estimate these components and demonstrate their interpretive power through visualizations on ResNet and prototype network applied to image classification and few-shot learning tasks. Our approach is distinguished by three key features: 1. Rooted in mutual information theory, it delivers a thorough and theoretically grounded interpretation, surpassing the scope of existing interpretability methods. 2. Unlike conventional methods that focus on explaining decisions, our approach centers on interpreting representations. 3. It seamlessly integrates into pre-existing network architectures, requiring only fine-tuning of the inserted modules.

Paper Structure

This paper contains 13 sections, 11 equations, 6 figures.

Figures (6)

  • Figure 1: module architecture: the Infomax estimator processes local inputs $X_i$ and representation $Z$ to estimate $I(Z, X_i)$, while the mask module generates $\lambda_i$ to filter $X$ into $X'$, enabling $I(Z, X_i')$ computation.
  • Figure 2: heatmap examples
  • Figure 3: images
  • Figure 4: total information heat map
  • Figure 5: a mixed figure combining the total information heat map and the original image
  • ...and 1 more figures