Table of Contents
Fetching ...

VITAL: More Understandable Feature Visualization through Distribution Alignment and Relevant Information Flow

Ada Gorgun, Bernt Schiele, Jonas Fischer

TL;DR

VITAL tackles the challenge of unclear and artifact-prone feature visualizations by reframing FV as a distribution-alignment problem between generated images and real image features. It introduces a differentiable sort-matching loss and incorporates relevance scores (via LRP) to filter out irrelevant activations, enabling scalable optimization across modern architectures including CNNs and Vision Transformers. Through extensive qualitative, quantitative, and human-interpretability evaluations, VITAL demonstrates cleaner, more faithful visualizations that better reveal the information encoded by target neurons and concepts. The approach promises practical benefits for mechanistic interpretability and safety-critical domains by providing clearer, human-understandable explanations of neural representations while maintaining fidelity to the model's reasoning.

Abstract

Neural networks are widely adopted to solve complex and challenging tasks. Especially in high-stakes decision-making, understanding their reasoning process is crucial, yet proves challenging for modern deep networks. Feature visualization (FV) is a powerful tool to decode what information neurons are responding to and hence to better understand the reasoning behind such networks. In particular, in FV we generate human-understandable images that reflect the information detected by neurons of interest. However, current methods often yield unrecognizable visualizations, exhibiting repetitive patterns and visual artifacts that are hard to understand for a human. To address these problems, we propose to guide FV through statistics of real image features combined with measures of relevant network flow to generate prototypical images. Our approach yields human-understandable visualizations that both qualitatively and quantitatively improve over state-of-the-art FVs across various architectures. As such, it can be used to decode which information the network uses, complementing mechanistic circuits that identify where it is encoded. Code is available at: https://github.com/adagorgun/VITAL

VITAL: More Understandable Feature Visualization through Distribution Alignment and Relevant Information Flow

TL;DR

VITAL tackles the challenge of unclear and artifact-prone feature visualizations by reframing FV as a distribution-alignment problem between generated images and real image features. It introduces a differentiable sort-matching loss and incorporates relevance scores (via LRP) to filter out irrelevant activations, enabling scalable optimization across modern architectures including CNNs and Vision Transformers. Through extensive qualitative, quantitative, and human-interpretability evaluations, VITAL demonstrates cleaner, more faithful visualizations that better reveal the information encoded by target neurons and concepts. The approach promises practical benefits for mechanistic interpretability and safety-critical domains by providing clearer, human-understandable explanations of neural representations while maintaining fidelity to the model's reasoning.

Abstract

Neural networks are widely adopted to solve complex and challenging tasks. Especially in high-stakes decision-making, understanding their reasoning process is crucial, yet proves challenging for modern deep networks. Feature visualization (FV) is a powerful tool to decode what information neurons are responding to and hence to better understand the reasoning behind such networks. In particular, in FV we generate human-understandable images that reflect the information detected by neurons of interest. However, current methods often yield unrecognizable visualizations, exhibiting repetitive patterns and visual artifacts that are hard to understand for a human. To address these problems, we propose to guide FV through statistics of real image features combined with measures of relevant network flow to generate prototypical images. Our approach yields human-understandable visualizations that both qualitatively and quantitatively improve over state-of-the-art FVs across various architectures. As such, it can be used to decode which information the network uses, complementing mechanistic circuits that identify where it is encoded. Code is available at: https://github.com/adagorgun/VITAL

Paper Structure

This paper contains 35 sections, 4 equations, 41 figures, 5 tables, 1 algorithm.

Figures (41)

  • Figure 1: Feature visualization through distribution matching. Unlike traditional feature visualization (FV) methods, which often produce artifacts or repetitive patterns, VITAL generates more understandable visualizations. Our approach scales effectively to modern architectures (rows), generalizes well across diverse classes (columns), and better captures meaningful network representations.
  • Figure 2: Importance of relevance scores. Given an image selected by its relevance to a neuron detecting an ear, we apply LRP from this neuron back to a preceding building block. We show the activation (first column) and relevance maps (second column) of two neurons from this block, respectively. While the activation is high for the ears (top) and background (bottom), only the ear is relevant for the target neuron. The difference between VITAL with and without LRP (third column) shows that incorporating relevance helps to avoid visualizing irrelevant features.
  • Figure 3: Example class visualizations. We show ImageNet class visualizations for (a) ResNet50 and (b) ViT-L-32.MACO and Fourier-based FV (top rows) produce repetitive, hard-to-interpret patterns. DeepInversion (3rd row) improves interpretability but introduces artifacts and lacks ViT compatibility. VITAL arguably yields much more interpretable visualizations. For additional results, including failure cases, we refer to supp. \ref{['sec:supp_experiments']}.
  • Figure 4: Visualizations of small "circuits". We provide visualizations of small "circuits" for three interesting classes of ImageNet. For each class, we found the three most relevant neurons in the penultimate layer based on LRP relevance scores and indicate the neuron ID on the arrow. For each neuron, we label its likely meaning and provide FVs by MACO and VITAL. The Zebra class shows high dependence on the characteristic pattern of the fur, the dog class a dependence on specific color patterns to be able to distinguish the abundant different dog classes in ImageNet, and Pineapple shows signs of overfitting with an association with collections of fruits.
  • Figure 5: t-SNE projection of embedding. We show a low-dimensional tSNE embedding of the features at the penultimate layer for five dog breeds indicated by color. Transparent circles are original training images and FVs are indicated by symbols: $\blacksquare$: VITAL, $\blacktriangle$: MACO, $\hexagonblack$: Fourier, $\bigstar$: DeepInv.
  • ...and 36 more figures