Table of Contents
Fetching ...

Shortcuts and Identifiability in Concept-based Models from a Neuro-Symbolic Lens

Samuele Bortolotti, Emanuele Marconato, Paolo Morettin, Andrea Passerini, Stefano Teso

TL;DR

The paper tackles identifiability and interpretability of concept-based models (CBMs) and neuro-symbolic CBMs under reasoning shortcuts. It extends the RS framework to CBMs with learned inference and no concept supervision, introducing the notion of intended semantics and joint reasoning shortcuts (JRSs). It derives conditions under which maximum-likelihood training identifies ground-truth concepts and the inference layer, and shows that JRSs are prevalent in practice with standard mitigations often ineffective. Through case studies on MNIST-based tasks, Clevr, and BDD-OIA, the work demonstrates the practical impact of JRSs on interpretability and OOD robustness, highlighting a need for stronger supervision or new mitigation strategies.

Abstract

Concept-based Models are neural networks that learn a concept extractor to map inputs to high-level concepts and an inference layer to translate these into predictions. Ensuring these modules produce interpretable concepts and behave reliably in out-of-distribution is crucial, yet the conditions for achieving this remain unclear. We study this problem by establishing a novel connection between Concept-based Models and reasoning shortcuts (RSs), a common issue where models achieve high accuracy by learning low-quality concepts, even when the inference layer is fixed and provided upfront. Specifically, we extend RSs to the more complex setting of Concept-based Models and derive theoretical conditions for identifying both the concepts and the inference layer. Our empirical results highlight the impact of RSs and show that existing methods, even combined with multiple natural mitigation strategies, often fail to meet these conditions in practice.

Shortcuts and Identifiability in Concept-based Models from a Neuro-Symbolic Lens

TL;DR

The paper tackles identifiability and interpretability of concept-based models (CBMs) and neuro-symbolic CBMs under reasoning shortcuts. It extends the RS framework to CBMs with learned inference and no concept supervision, introducing the notion of intended semantics and joint reasoning shortcuts (JRSs). It derives conditions under which maximum-likelihood training identifies ground-truth concepts and the inference layer, and shows that JRSs are prevalent in practice with standard mitigations often ineffective. Through case studies on MNIST-based tasks, Clevr, and BDD-OIA, the work demonstrates the practical impact of JRSs on interpretability and OOD robustness, highlighting a need for stronger supervision or new mitigation strategies.

Abstract

Concept-based Models are neural networks that learn a concept extractor to map inputs to high-level concepts and an inference layer to translate these into predictions. Ensuring these modules produce interpretable concepts and behave reliably in out-of-distribution is crucial, yet the conditions for achieving this remain unclear. We study this problem by establishing a novel connection between Concept-based Models and reasoning shortcuts (RSs), a common issue where models achieve high accuracy by learning low-quality concepts, even when the inference layer is fixed and provided upfront. Specifically, we extend RSs to the more complex setting of Concept-based Models and derive theoretical conditions for identifying both the concepts and the inference layer. Our empirical results highlight the impact of RSs and show that existing methods, even combined with multiple natural mitigation strategies, often fail to meet these conditions in practice.

Paper Structure

This paper contains 40 sections, 12 theorems, 96 equations, 14 figures, 15 tables.

Key Result

Theorem 3.6

Under assu:conceptsassu:labels, the number of deterministic JRSs is: where the sum runs over $\mathsf{Vert}(\mathcal{A})\xspace \times \mathsf{Vert}(\mathcal{B})\xspace$, and $C[\mathcal{G}]$ counts the pairs with intended semantics.

Figures (14)

  • Figure 1: Joint reasoning shortcuts. The goal is to predict whether the sum of two MNIST digits is odd (as in \ref{['ex:sum-parity']}) from a training set of all possible unique (even, even), (odd, odd), and (odd, even) pairs of MNIST digits. Green elements are fixed, purple ones are learned. Left: ground-truth concepts and inference layer. Middle: NeSy-CBMs with given knowledge can learn reasoning shortcuts, i.e., concepts with unintended semantics. Right: CBMs can learn joint reasoning shortcuts, i.e., both concepts and inference layer have unintended semantics.
  • Figure 2: Examples of semantics in MNIST-SumParity restricted to $\bm{\mathrm{g}}, \bm{\mathrm{c}} \in \{0, 1, 2\}^2$ for readability. Left: ideally, $\bm{\mathrm{\alpha}}$ should be the identity (i.e., $\bm{\mathrm{C}}$ recovers the ground-truth concepts $\bm{\mathrm{G}}$) and the inference layer should learn $\bm{\mathrm{\beta}}^*$. Middle: $(\bm{\mathrm{\alpha}}_\mathrm{IS}, \bm{\mathrm{\beta}}_\mathrm{IS}) \ne (\mathrm{id}\xspace, \bm{\mathrm{\beta}}^*)$ has intended semantics (\ref{['def:intended-semantics']}), i.e., the ground-truth concepts and inference layer can be recovered and generalize OOD. Right: $(\bm{\mathrm{\alpha}}_\mathrm{JRS}, \bm{\mathrm{\beta}}_\mathrm{JRS})$ affected by the Joint Reasoning Shortcuts in \ref{['fig:sum-parity']}. Elements (predicted concepts $\bm{\mathrm{C}}$ and entries in $\bm{\mathrm{\beta}}$) in red are never predicted nor used, highlighting simplicity bias. Maps are visualized as matrices.
  • Figure 3: Traditional mitigations have limited effect on CBNM for MNIST-SumParity. The only outlier is contrastive learning (orange and purple), which consistently ameliorates concept collapse.
  • Figure 4: Example of Clevr data.
  • Figure 5: \ref{['fig:curves-sumxor-reduced']} with standard deviation over $5$ seeds
  • ...and 9 more figures

Theorems & Definitions (26)

  • Example 2.1: MNIST-SumParity
  • Example 2.2
  • Definition 3.3: Intended Semantics
  • Definition 3.4: Joint Reasoning Shortcut
  • Example 3.5
  • Theorem 3.6: Informal
  • Corollary 3.6
  • Theorem 3.8: Identifiability
  • Lemma C.1: marconato2023not
  • Lemma C.2: Deterministic optima of the likelihood
  • ...and 16 more