Table of Contents
Fetching ...

Hierarchical Concept Embedding & Pursuit for Interpretable Image Classification

Nghia Nguyen, Tianjiao Ding, René Vidal

TL;DR

The paper tackles the challenge of interpretable-by-design image classification by enforcing hierarchical, semantically grounded representations. It develops Hierarchical Concept Embedding & Pursuit (HCEP), which constructs a hierarchy-aware dictionary from vision-language embeddings and recovers root-to-leaf concept paths using Hierarchical OMP with beam search. The approach yields superior concept precision and recall over baselines, especially in low-data regimes, while maintaining competitive classification accuracy. These findings highlight the value of incorporating hierarchical priors into sparse coding for more reliable, interpretable vision systems.

Abstract

Interpretable-by-design models are gaining traction in computer vision because they provide faithful explanations for their predictions. In image classification, these models typically recover human-interpretable concepts from an image and use them for classification. Sparse concept recovery methods leverage the latent space of vision-language models to represent image embeddings as a sparse combination of concept embeddings. However, because such methods ignore the hierarchical structure of concepts, they can produce correct predictions with explanations that are inconsistent with the hierarchy. In this work, we propose Hierarchical Concept Embedding \& Pursuit (HCEP), a framework that induces a hierarchy of concept embeddings in the latent space and uses hierarchical sparse coding to recover the concepts present in an image. Given a hierarchy of semantic concepts, we construct a corresponding hierarchy of concept embeddings and, assuming the correct concepts for an image form a rooted path in the hierarchy, derive desirable conditions for identifying them in the embedded space. We show that hierarchical sparse coding reliably recovers hierarchical concept embeddings, whereas vanilla sparse coding fails. Our experiments on real-world datasets demonstrate that HCEP outperforms baselines in concept precision and recall while maintaining competitive classification accuracy. Moreover, when the number of samples is limited, HCEP achieves superior classification accuracy and concept recovery. These results show that incorporating hierarchical structures into sparse coding yields more reliable and interpretable image classification models.

Hierarchical Concept Embedding & Pursuit for Interpretable Image Classification

TL;DR

The paper tackles the challenge of interpretable-by-design image classification by enforcing hierarchical, semantically grounded representations. It develops Hierarchical Concept Embedding & Pursuit (HCEP), which constructs a hierarchy-aware dictionary from vision-language embeddings and recovers root-to-leaf concept paths using Hierarchical OMP with beam search. The approach yields superior concept precision and recall over baselines, especially in low-data regimes, while maintaining competitive classification accuracy. These findings highlight the value of incorporating hierarchical priors into sparse coding for more reliable, interpretable vision systems.

Abstract

Interpretable-by-design models are gaining traction in computer vision because they provide faithful explanations for their predictions. In image classification, these models typically recover human-interpretable concepts from an image and use them for classification. Sparse concept recovery methods leverage the latent space of vision-language models to represent image embeddings as a sparse combination of concept embeddings. However, because such methods ignore the hierarchical structure of concepts, they can produce correct predictions with explanations that are inconsistent with the hierarchy. In this work, we propose Hierarchical Concept Embedding \& Pursuit (HCEP), a framework that induces a hierarchy of concept embeddings in the latent space and uses hierarchical sparse coding to recover the concepts present in an image. Given a hierarchy of semantic concepts, we construct a corresponding hierarchy of concept embeddings and, assuming the correct concepts for an image form a rooted path in the hierarchy, derive desirable conditions for identifying them in the embedded space. We show that hierarchical sparse coding reliably recovers hierarchical concept embeddings, whereas vanilla sparse coding fails. Our experiments on real-world datasets demonstrate that HCEP outperforms baselines in concept precision and recall while maintaining competitive classification accuracy. Moreover, when the number of samples is limited, HCEP achieves superior classification accuracy and concept recovery. These results show that incorporating hierarchical structures into sparse coding yields more reliable and interpretable image classification models.
Paper Structure (34 sections, 7 theorems, 30 equations, 14 figures, 2 tables, 6 algorithms)

This paper contains 34 sections, 7 theorems, 30 equations, 14 figures, 2 tables, 6 algorithms.

Key Result

Proposition 3.1

Suppose the following geometric conditions hold for all nodes in the hierarchy: Then the subtrees rooted at any two sibling nodes are disjoint, and every node has a unique parent.

Figures (14)

  • Figure 1: An illustration of hierarchical concept explanations for image classification. Given an image of a cat, an interpretable-by-design model should extract the concepts that form a rooted path in the hierarchy, and use only these concepts to classify the image as a cat. However, existing concept recovery methods may extract concepts that are inconsistent with the hierarchy, such as vehicle, leading to incorrect explanations.
  • Figure 2: Illustration of the hierarchical latent data model in $\mathbb{R}^2$. The root node spawns two child nodes, animal and tree, each having their own children. There are two desirable conditions for concept identifiability: (1) The children cluster around the parent while sibling nodes are separated; (2) The difference between a child and its parent (shown as red arrows) is orthogonal to the parent, and the differences between the children of a parent form a simplex (which is a line in $\mathbb{R}^2$). The difference vectors capture the characteristics that distinguish each child from its parent. See § \ref{['sec:generative_model']} for more details.
  • Figure 3: An example hierarchy with $L=2$ levels, branching factor $b=2$, and $N_L=6$ nodes in total. Each node $i \in \mathcal{A}$ has an associated vector representation $\bm{a}^{(i)} \in \mathbb{R}^{d}$, where $\mathcal{A} = \{1, \ldots, N_L\}$ is the set of node indices. Let $\mathop{\mathrm{par}}\nolimits(\cdot), \mathop{\mathrm{chi}}\nolimits(\cdot), \mathop{\mathrm{anc}}\nolimits(\cdot), \mathop{\mathrm{desc}}\nolimits(\cdot): \mathcal{A} \rightarrow \mathcal{P}(\mathcal{A})$ respectively be functions returning the set of parent, children, ancestors, and descendants of a node. The level of a node is the number of its ancestors, that is $\mathop{\mathrm{lev}}\nolimits(i)=|\mathop{\mathrm{anc}}\nolimits(i)|$.
  • Figure 4: Illustration of the hierarchical sparse decomposition in Eq. \ref{['eq:hier_sparse_expansion']}. Given an image of a cat, the correct explanation follows the path from the root to the leaf node Cat (shown in green dashed lines). The image embedding is expressed as a sparse linear combination of the root child synset embedding (e.g., Animal) and synset difference embeddings along the path (e.g., Mammal$-$Animal, Cat$-$Mammal). The sparse code has non-zero entries only for atoms corresponding to nodes on the correct path.
  • Figure 5: Hierarchical OMP has improved support recovery precision and recall compared to standard sparse coding methods on synthetic data.
  • ...and 9 more figures

Theorems & Definitions (15)

  • Proposition 3.1: Well-clustered hierarchy ensures unique parent assignment
  • Proposition 3.2: Geometric half-angle decreasing schedule
  • Proposition 3.3: Depth--dimension necessity
  • Proposition 4.1: Informal
  • proof
  • proof
  • proof
  • Lemma B.1: Column normalization equivalence
  • proof
  • Definition B.2: ERC on normalized dictionary
  • ...and 5 more