Hierarchical Concept-based Interpretable Models

Oscar Hill; Mateo Espinosa Zarlenga; Mateja Jamnik

Hierarchical Concept-based Interpretable Models

Oscar Hill, Mateo Espinosa Zarlenga, Mateja Jamnik

TL;DR

Hierarchical Concept Embedding Models (HiCEMs) are introduced, a new family of CEMs that explicitly model concept relationships through hierarchical structures that enable powerful test-time concept interventions at different granularities, leading to improved task accuracy.

Abstract

Modern deep neural networks remain challenging to interpret due to the opacity of their latent representations, impeding model understanding, debugging, and debiasing. Concept Embedding Models (CEMs) address this by mapping inputs to human-interpretable concept representations from which tasks can be predicted. Yet, CEMs fail to represent inter-concept relationships and require concept annotations at different granularities during training, limiting their applicability. In this paper, we introduce Hierarchical Concept Embedding Models (HiCEMs), a new family of CEMs that explicitly model concept relationships through hierarchical structures. To enable HiCEMs in real-world settings, we propose Concept Splitting, a method for automatically discovering finer-grained sub-concepts from a pretrained CEM's embedding space without requiring additional annotations. This allows HiCEMs to generate fine-grained explanations from limited concept labels, reducing annotation burdens. Our evaluation across multiple datasets, including a user study and experiments on PseudoKitchens, a newly proposed concept-based dataset of 3D kitchen renders, demonstrates that (1) Concept Splitting discovers human-interpretable sub-concepts absent during training that can be used to train highly accurate HiCEMs, and (2) HiCEMs enable powerful test-time concept interventions at different granularities, leading to improved task accuracy.

Hierarchical Concept-based Interpretable Models

TL;DR

Abstract

Paper Structure (57 sections, 14 figures, 14 tables, 1 algorithm)

This paper contains 57 sections, 14 figures, 14 tables, 1 algorithm.

Introduction
Background and Related Work
Concept learning
Concept Embedding Models
Concept discovery
Modelling concept relationships
Concept Splitting
Hierarchical CEMs
Architecture
Sub-concepts Modules
Training
Concept Interventions
Experiments
PseudoKitchens
Setup
...and 42 more sections

Figures (14)

Figure 1: Concept Splitting. (a) Train a CEM and calculate concept embeddings. (b) Train SAEs on the embeddings (the image depicts a single embedding set). (c) Create concept labels. The green points are marked as having the new concept, and the black points (where the parent concept is not active or where the SAE feature is not active) are marked as not having the new concept.
Figure 2: Hierarchical CEM: as in a CEM, from a latent code $\mathbf{h}$, we learn two embeddings per concept ($\mathbf{\hat{c}_i^{+\prime}}$ and $\mathbf{\hat{c}_i^{-\prime}}$). These embeddings are then passed through sub-concepts modules (Figure \ref{['subconcepts']}), which produce new embeddings ($\mathbf{\hat{c}_i^{+}}$ and $\mathbf{\hat{c}_i^{-}}$) that include information about sub-concepts. The sub-concepts modules also output the most likely sub-concept probabilities, which are used to calculate top-level concept probabilities. These probabilities are used to output an embedding for each concept via a weighted mixture of positive and negative embeddings.
Figure 3: Positive sub-concepts module: positive sub-concept embedding generators $\phi_{kj}^+$ (illustrated for "contains apples" and "contains pears") produce sub-concept embeddings from the preliminary parent concept embedding $\mathbf{\hat{c}}_k^{+\prime}$. A shared scoring function $s(\cdot)$ predicts probabilities for each sub-concept. The positive parent concept embedding $\mathbf{\hat{c}}_k^+$ is computed as a weighted mixture of sub-concept embeddings, whilst an estimate for the parent concept probability $\hat{p}_k^+$ is obtained via a differentiable soft maximum operation over the sub-concept probabilities.
Figure 4: Task accuracy as discovered concepts are intervened. Intervening on discovered sub-concepts improves task accuracy. In some cases, such as in CUB and PseudoKitchens, interventions in HiCEMs lead to a greater increase in task accuracy than in CEMs trained with Concept Splitting's discovered concepts. LF-CBMs labelfree do not easily support interventions.
Figure 5: Change in task accuracy as provided concepts are intervened. Provided concept interventions on ImageNet are shown in Appendix \ref{['appendix:imagenetinterventions']}. Provided concept interventions work just as well in HiCEMs as they do in CEMs.
...and 9 more figures

Hierarchical Concept-based Interpretable Models

TL;DR

Abstract

Hierarchical Concept-based Interpretable Models

Authors

TL;DR

Abstract

Table of Contents

Figures (14)