Table of Contents
Fetching ...

Interpreting Neurons in Deep Vision Networks with Language Models

Nicholas Bai, Rahul A. Iyer, Tuomas Oikarinen, Akshay Kulkarni, Tsui-Wei Weng

TL;DR

The paper introduces Describe-and-Dissect (DnD), a training-free framework to automatically label the functions of hidden neurons in deep vision networks by leveraging multimodal models to generate natural language explanations. DnD eliminates the need for labeled concept sets and outperforms baselines such as Network Dissection, CLIP-Dissect, and MILAN in both qualitative and crowdsourced evaluations, producing richer and sometimes multi-concept neuron descriptions. The method combines an attention-cropping probing strategy, BLIP captioning, GPT-based concept summarization, and Stable Diffusion–driven synthetic images with a principled scoring function to select the best concept. A land-cover prediction use case demonstrates practical utility, revealing interpretable concept groupings and spurious correlations, and enabling targeted pruning of uninterpretable neurons to preserve or improve accuracy. Overall, DnD showcases the potential of modular multimodal models to generate faithful, scalable explanations that can enhance trust and safety in AI systems.

Abstract

In this paper, we propose Describe-and-Dissect (DnD), a novel method to describe the roles of hidden neurons in vision networks. DnD utilizes recent advancements in multimodal deep learning to produce complex natural language descriptions, without the need for labeled training data or a predefined set of concepts to choose from. Additionally, DnD is training-free, meaning we don't train any new models and can easily leverage more capable general purpose models in the future. We have conducted extensive qualitative and quantitative analysis to show that DnD outperforms prior work by providing higher quality neuron descriptions. Specifically, our method on average provides the highest quality labels and is more than 2$\times$ as likely to be selected as the best explanation for a neuron than the best baseline. Finally, we present a use case providing critical insights into land cover prediction models for sustainability applications. Our code and data are available at https://github.com/Trustworthy-ML-Lab/Describe-and-Dissect.

Interpreting Neurons in Deep Vision Networks with Language Models

TL;DR

The paper introduces Describe-and-Dissect (DnD), a training-free framework to automatically label the functions of hidden neurons in deep vision networks by leveraging multimodal models to generate natural language explanations. DnD eliminates the need for labeled concept sets and outperforms baselines such as Network Dissection, CLIP-Dissect, and MILAN in both qualitative and crowdsourced evaluations, producing richer and sometimes multi-concept neuron descriptions. The method combines an attention-cropping probing strategy, BLIP captioning, GPT-based concept summarization, and Stable Diffusion–driven synthetic images with a principled scoring function to select the best concept. A land-cover prediction use case demonstrates practical utility, revealing interpretable concept groupings and spurious correlations, and enabling targeted pruning of uninterpretable neurons to preserve or improve accuracy. Overall, DnD showcases the potential of modular multimodal models to generate faithful, scalable explanations that can enhance trust and safety in AI systems.

Abstract

In this paper, we propose Describe-and-Dissect (DnD), a novel method to describe the roles of hidden neurons in vision networks. DnD utilizes recent advancements in multimodal deep learning to produce complex natural language descriptions, without the need for labeled training data or a predefined set of concepts to choose from. Additionally, DnD is training-free, meaning we don't train any new models and can easily leverage more capable general purpose models in the future. We have conducted extensive qualitative and quantitative analysis to show that DnD outperforms prior work by providing higher quality neuron descriptions. Specifically, our method on average provides the highest quality labels and is more than 2 as likely to be selected as the best explanation for a neuron than the best baseline. Finally, we present a use case providing critical insights into land cover prediction models for sustainability applications. Our code and data are available at https://github.com/Trustworthy-ML-Lab/Describe-and-Dissect.
Paper Structure (50 sections, 1 equation, 27 figures, 13 tables)

This paper contains 50 sections, 1 equation, 27 figures, 13 tables.

Figures (27)

  • Figure 1: Neuron descriptions provided by our method (DnD) and baselines CLIP-Dissect oikarinen2023clipdissect, MILAN hernandez2022natural), and Network Dissection bau2017network for random neurons from ResNet-50 trained on ImageNet. We have added the average quality rating from our Amazon Mechanical Turk experiment described in Section \ref{['sec:Mturk_results']} next to each label and color-coded the neuron descriptions by whether we believed they were accurate, somewhat correct or vague/imprecise.
  • Figure 2: Overview of Describe-and-Dissect (DnD) algorithm. For a given target model, DnD consists of three important steps to identify neuron concepts (e.g. 'Swimming Shark' for neuron $n$).
  • Figure 3: Concept Selection (Step 3) supplements Concept Generation (Step 2) accuracy. We show that concept selection improves Concept Generation by validating candidate concepts.
  • Figure 4: Layer 2 Concept Profile. We cluster neurons with similar concepts and categorize them into 6 NAIP superclasses. Interpretable concepts have more neurons associated with them. Some superclasses do not appear due to dataset bias or intrinsic similarities between classes.
  • Figure 5: Detailed Visualization of Attention Cropping Pipeline. All three steps of attention cropping are shown for Layer 2 Neuron 165. Steps 1 and 2 illustrate the derivation of bounding boxes from salient regions in the original activation map and are overlaid on original activating images from $\mathcal{D}_{probe}$ in Step 3. Cropped images are added back to $\mathcal{D}_{probe}$ to form $\mathcal{D}_{probe} \cup \mathcal{D}_{cropped}$.
  • ...and 22 more figures