Table of Contents
Fetching ...

Coarse-to-Fine Concept Bottleneck Models

Konstantinos P. Panousis, Dino Ienco, Diego Marcos

TL;DR

This work targets ante hoc interpretability, and specifically Concept Bottleneck Models, and proposes a novel two-level concept discovery formulation leveraging recent advances in vision-language models and an innovative formulation for coarse-to-fine concept selection via data-driven and sparsity-inducing Bayesian arguments.

Abstract

Deep learning algorithms have recently gained significant attention due to their impressive performance. However, their high complexity and un-interpretable mode of operation hinders their confident deployment in real-world safety-critical tasks. This work targets ante hoc interpretability, and specifically Concept Bottleneck Models (CBMs). Our goal is to design a framework that admits a highly interpretable decision making process with respect to human understandable concepts, on two levels of granularity. To this end, we propose a novel two-level concept discovery formulation leveraging: (i) recent advances in vision-language models, and (ii) an innovative formulation for coarse-to-fine concept selection via data-driven and sparsity-inducing Bayesian arguments. Within this framework, concept information does not solely rely on the similarity between the whole image and general unstructured concepts; instead, we introduce the notion of concept hierarchy to uncover and exploit more granular concept information residing in patch-specific regions of the image scene. As we experimentally show, the proposed construction not only outperforms recent CBM approaches, but also yields a principled framework towards interpetability.

Coarse-to-Fine Concept Bottleneck Models

TL;DR

This work targets ante hoc interpretability, and specifically Concept Bottleneck Models, and proposes a novel two-level concept discovery formulation leveraging recent advances in vision-language models and an innovative formulation for coarse-to-fine concept selection via data-driven and sparsity-inducing Bayesian arguments.

Abstract

Deep learning algorithms have recently gained significant attention due to their impressive performance. However, their high complexity and un-interpretable mode of operation hinders their confident deployment in real-world safety-critical tasks. This work targets ante hoc interpretability, and specifically Concept Bottleneck Models (CBMs). Our goal is to design a framework that admits a highly interpretable decision making process with respect to human understandable concepts, on two levels of granularity. To this end, we propose a novel two-level concept discovery formulation leveraging: (i) recent advances in vision-language models, and (ii) an innovative formulation for coarse-to-fine concept selection via data-driven and sparsity-inducing Bayesian arguments. Within this framework, concept information does not solely rely on the similarity between the whole image and general unstructured concepts; instead, we introduce the notion of concept hierarchy to uncover and exploit more granular concept information residing in patch-specific regions of the image scene. As we experimentally show, the proposed construction not only outperforms recent CBM approaches, but also yields a principled framework towards interpetability.
Paper Structure (23 sections, 16 equations, 13 figures, 4 tables)

This paper contains 23 sections, 16 equations, 13 figures, 4 tables.

Figures (13)

  • Figure 1: (Left) The Concept Discovery Block (CDB). Given a set of concepts and an image, we compute their similarity via a VLM; we consider a data-driven mechanism for concept discovery, sampling from an amortized Bernoulli posterior. (Right) A schematic of the envisioned CF-CBMs. We consider a set of high level concepts, each described by a number of attributes; this forms the pool of low-level concepts. Our objective is to discover concepts that describe the whole image, while exploiting information residing in, in this case $P=9$, patch-specific regions by matching low-level concepts to each patch and aggregate the information to obtain a single representation. Each level comprises CDBs, while the levels are linked together via the binary indicators $\boldsymbol Z_H$ and $\boldsymbol Z_L$.
  • Figure 2: Original and additional discovered concepts for the Sussex Spaniel ImageNet class. By green, we denote the concepts retained from the original low-level set pertaining to the class, by maroon, concepts removed via the binary indicators $\boldsymbol Z$, and by purple, the newly discovered concepts.
  • Figure 3: A random example from the Black Swan class of ImageNet-1k validation set. On the upper part, the original concept set corresponding to the class is depicted; on the lower, some of the concepts discovered via our novel CF-CBM.
  • Figure 4: Alignment between the inferred concept presence indicators and CLIP similarities on the High Level of the CF-CBM framework. We split the CLIP similarities into bins of size $0.05$; in each bin we count the number of concepts assigned therein (according to their CLIP similarity and denoted by $\#$Conc) and we compute the fraction of inferred active concepts to said number. We observe that in this case, the higher the similarity, concepts with high similarity value exhibit a largest percentage of activation.
  • Figure 5: Alignment between the inferred concept presence indicators and CLIP similarities on the High Level of the CF-CBM framework for Example 50 in the CUB validation set. In this instance, we observe the same pattern as in Fig. \ref{['fig:bins_high']}.
  • ...and 8 more figures