Table of Contents
Fetching ...

LLM-assisted Concept Discovery: Automatically Identifying and Explaining Neuron Functions

Nhat Hoang-Xuan, Minh Vu, My T. Thai

TL;DR

This work shows that, without a restricted set of pre-defined concepts, the proposed method gives rise to novel interpretable concepts that are more faithful to the model's behavior, providing a credible automated tool to explain deep neural networks.

Abstract

Providing textual concept-based explanations for neurons in deep neural networks (DNNs) is of importance in understanding how a DNN model works. Prior works have associated concepts with neurons based on examples of concepts or a pre-defined set of concepts, thus limiting possible explanations to what the user expects, especially in discovering new concepts. Furthermore, defining the set of concepts requires manual work from the user, either by directly specifying them or collecting examples. To overcome these, we propose to leverage multimodal large language models for automatic and open-ended concept discovery. We show that, without a restricted set of pre-defined concepts, our method gives rise to novel interpretable concepts that are more faithful to the model's behavior. To quantify this, we validate each concept by generating examples and counterexamples and evaluating the neuron's response on this new set of images. Collectively, our method can discover concepts and simultaneously validate them, providing a credible automated tool to explain deep neural networks.

LLM-assisted Concept Discovery: Automatically Identifying and Explaining Neuron Functions

TL;DR

This work shows that, without a restricted set of pre-defined concepts, the proposed method gives rise to novel interpretable concepts that are more faithful to the model's behavior, providing a credible automated tool to explain deep neural networks.

Abstract

Providing textual concept-based explanations for neurons in deep neural networks (DNNs) is of importance in understanding how a DNN model works. Prior works have associated concepts with neurons based on examples of concepts or a pre-defined set of concepts, thus limiting possible explanations to what the user expects, especially in discovering new concepts. Furthermore, defining the set of concepts requires manual work from the user, either by directly specifying them or collecting examples. To overcome these, we propose to leverage multimodal large language models for automatic and open-ended concept discovery. We show that, without a restricted set of pre-defined concepts, our method gives rise to novel interpretable concepts that are more faithful to the model's behavior. To quantify this, we validate each concept by generating examples and counterexamples and evaluating the neuron's response on this new set of images. Collectively, our method can discover concepts and simultaneously validate them, providing a credible automated tool to explain deep neural networks.
Paper Structure (26 sections, 6 equations, 6 figures, 3 tables)

This paper contains 26 sections, 6 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Overview of our LLM-assisted concept discovery algorithm. Given a neuron of interest, we generate an exemplar representation from the probing dataset. Next, we find an interpretable concept by looking for a group of images with high average CLIP similarity. Finally, we prompt the MLLM with the combined image for the common concept that the neuron activates on.
  • Figure 2: Overview of our concept evaluation. Given a concept, we use Chain-of-Thought prompting on an LLM to generate its co-hyponyms. Then, we use another prompt to generate captions for the concept and the co-hyponyms. The last step utilizes a diffusion model to transform the caption into the sets of examples and non-examples $E_c$ and $E_{\bar{c}}$
  • Figure 3: Count of occurrences of the top-50 most frequent words in the explanations of the last layer of CLIP ResNet50.
  • Figure 4: Examples of cases where our method produce a more detailed concept. All neurons are labeled as "Text" by MILAN.
  • Figure 5: Example of concept validation in action. We show the process of evaluating our proposed concept "nfl team merchandise", which results in a score of $0.984$ on the full set of $E_c$ and $E_{\bar{c}}$. We italicise and underline important words in the captions that are directly related to the concept or the co-hyponym.
  • ...and 1 more figures