Table of Contents
Fetching ...

Select, Hypothesize and Verify: Towards Verified Neuron Concept Interpretation

ZeBin Ji, Yang Hu, Xiuli Bi, Bo Liu, Bin Xiao

Abstract

It is essential for understanding neural network decisions to interpret the functionality (also known as concepts) of neurons. Existing approaches describe neuron concepts by generating natural language descriptions, thereby advancing the understanding of the neural network's decision-making mechanism. However, these approaches assume that each neuron has well-defined functions and provides discriminative features for neural network decision-making. In fact, some neurons may be redundant or may offer misleading concepts. Thus, the descriptions for such neurons may cause misinterpretations of the factors driving the neural network's decisions. To address the issue, we introduce a verification of neuron functions, which checks whether the generated concept highly activates the corresponding neuron. Furthermore, we propose a Select-Hypothesize-Verify framework for interpreting neuron functionality. This framework consists of: 1) selecting activation samples that best capture a neuron's well-defined functional behavior through activation-distribution analysis; 2) forming hypotheses about concepts for the selected neurons; and 3) verifying whether the generated concepts accurately reflect the functionality of the neuron. Extensive experiments show that our method produces more accurate neuron concepts. Our generated concepts activate the corresponding neurons with a probability approximately 1.5 times that of the current state-of-the-art method.

Select, Hypothesize and Verify: Towards Verified Neuron Concept Interpretation

Abstract

It is essential for understanding neural network decisions to interpret the functionality (also known as concepts) of neurons. Existing approaches describe neuron concepts by generating natural language descriptions, thereby advancing the understanding of the neural network's decision-making mechanism. However, these approaches assume that each neuron has well-defined functions and provides discriminative features for neural network decision-making. In fact, some neurons may be redundant or may offer misleading concepts. Thus, the descriptions for such neurons may cause misinterpretations of the factors driving the neural network's decisions. To address the issue, we introduce a verification of neuron functions, which checks whether the generated concept highly activates the corresponding neuron. Furthermore, we propose a Select-Hypothesize-Verify framework for interpreting neuron functionality. This framework consists of: 1) selecting activation samples that best capture a neuron's well-defined functional behavior through activation-distribution analysis; 2) forming hypotheses about concepts for the selected neurons; and 3) verifying whether the generated concepts accurately reflect the functionality of the neuron. Extensive experiments show that our method produces more accurate neuron concepts. Our generated concepts activate the corresponding neurons with a probability approximately 1.5 times that of the current state-of-the-art method.

Paper Structure

This paper contains 14 sections, 4 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Comparison of previous methods and our proposed method with verification. The neurons are sampled from the second-to-last layer of ResNet-18. Existing methods assume that concepts inferred from activated neurons are all accurate, while our proposed method introduces verification to identify incorrect ones. Specifically, for neuron 162, both the concepts (Small Round Beards and Curly Dense Coat) are inferred from high-activation images. Verification shows that the activation rate for small round whiskers (0.26) suggests this concept may be incorrect, while curly dense hair achieves 0.98, confirming the correctness of the concept.
  • Figure 2: Overview of SIEVE. 1) Select: For each sample, we obtain the normalized activation of neuron $i$ from the network’s penultimate layer. Samples with activations above a predefined threshold are identified as high-activation samples. 2) These high-activation samples are clustered, and a vision-language model is used to generate the hypothesized concepts for each cluster. 3) Each concept is converted into a semantic image using text-to-image generation. The generated images are fed into the target model to measure the activation rate of neuron $i$, thereby verifying the semantic features that the neuron reliably encodes.
  • Figure 3: Activation distributions and Concepts of high-discrimination neurons (Neuron 507) and low-discrimination neurons (Neuron 144). The high activation of high-discrimination neurons exhibits consistent patterns while the high activation of low-discrimination neurons fails to do so.
  • Figure 4: Functional descriptions of penultimate-layer neurons in ResNet-50 and ViT-B/16, generated by SIEVE, CLIP-Dissect Dissect and WWW ahn2024unified. SIEVE provides more complete and accurate semantic explanations. It captures localized features and multiple concepts, rather than only broad attributes such as object category or color. For example, ViT-B/16 neuron 37 is described by SIEVE with specific cues like Short Dense Coat, whereas CLIP-Dissect and WWW may omit some relevant concepts or provide only coarse labels such as Dog. Overall, the neuron-level characterizations produced by SIEVE are more fine-grained and validated.
  • Figure 5: Visualization of the domain shift effect. Significant discrepancies exist between aerial and generated samples.
  • ...and 1 more figures