Table of Contents
Fetching ...

Hierarchical, Interpretable, Label-Free Concept Bottleneck Model

Haodong Xie, Yujun Cai, Rahul Singh Maharjan, Yiwei Wang, Federico Tavella, Angelo Cangelosi

Abstract

Concept Bottleneck Models (CBMs) introduce interpretability to black-box deep learning models by predicting labels through human-understandable concepts. However, unlike humans, who identify objects at different levels of abstraction using both general and specific features, existing CBMs operate at a single semantic level in both concept and label space. We propose HIL-CBM, a Hierarchical Interpretable Label-Free Concept Bottleneck Model that extends CBMs into a hierarchical framework to enhance interpretability by more closely mirroring the human cognitive process. HIL-CBM enables classification and explanation across multiple semantic levels without requiring relational concept annotations. HIL-CBM aligns the abstraction level of concept-based explanations with that of model predictions, progressing from abstract to concrete. This is achieved by (i) introducing a gradient-based visual consistency loss that encourages abstraction layers to focus on similar spatial regions, and (ii) training dual classification heads, each operating on feature concepts at different abstraction levels. Experiments on benchmark datasets demonstrate that HIL-CBM outperforms state-of-the-art sparse CBMs in classification accuracy. Human evaluations further show that HIL-CBM provides more interpretable and accurate explanations, while maintaining a hierarchical and label-free approach to feature concepts.

Hierarchical, Interpretable, Label-Free Concept Bottleneck Model

Abstract

Concept Bottleneck Models (CBMs) introduce interpretability to black-box deep learning models by predicting labels through human-understandable concepts. However, unlike humans, who identify objects at different levels of abstraction using both general and specific features, existing CBMs operate at a single semantic level in both concept and label space. We propose HIL-CBM, a Hierarchical Interpretable Label-Free Concept Bottleneck Model that extends CBMs into a hierarchical framework to enhance interpretability by more closely mirroring the human cognitive process. HIL-CBM enables classification and explanation across multiple semantic levels without requiring relational concept annotations. HIL-CBM aligns the abstraction level of concept-based explanations with that of model predictions, progressing from abstract to concrete. This is achieved by (i) introducing a gradient-based visual consistency loss that encourages abstraction layers to focus on similar spatial regions, and (ii) training dual classification heads, each operating on feature concepts at different abstraction levels. Experiments on benchmark datasets demonstrate that HIL-CBM outperforms state-of-the-art sparse CBMs in classification accuracy. Human evaluations further show that HIL-CBM provides more interpretable and accurate explanations, while maintaining a hierarchical and label-free approach to feature concepts.

Paper Structure

This paper contains 23 sections, 8 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Our proposed HIL-CBM extends the CBM architecture into a hierarchical framework. The upper concept and label spaces capture general semantic features and broader class categories, while the lower spaces focus on more specific features and classes.
  • Figure 2: Overview of our proposed model, HIL-CBM. A pre-trained backbone processes input images to produce image embeddings. Two concept layers, each focusing on a different level of abstraction, project the feature maps into interpretable hierarchical concept spaces. These layers are trained using CLIP-Dissect similarity loss and visual consistency loss. Finally, two hierarchical classifiers predict classes at different levels of abstraction, using cross-entropy loss and semantic consistency loss based on the learned hierarchical concept features.
  • Figure 3: Visualization of two-level predictions from HIL-CBM and the corresponding explanations at each level. The model predicts both levels of class labels, accompanied by concept-based explanations aligned with each level of abstraction. This hierarchical interpretability enables users to understand the model decisions from general to specific.
  • Figure 4: Examples of model debugging. (a) An example where the higher-level prediction is correct, but the lower-level prediction is wrong. Debugging is performed by editing weights using domain knowledge. (b) An example where both predictions are incorrect. Debugging proceeds hierarchically: correcting the higher-level prediction enables guided refinement of the lower-level prediction via class masking.