A Neuro-Symbolic Benchmark Suite for Concept Quality and Reasoning Shortcuts

Samuele Bortolotti; Emanuele Marconato; Tommaso Carraro; Paolo Morettin; Emile van Krieken; Antonio Vergari; Stefano Teso; Andrea Passerini

A Neuro-Symbolic Benchmark Suite for Concept Quality and Reasoning Shortcuts

Samuele Bortolotti, Emanuele Marconato, Tommaso Carraro, Paolo Morettin, Emile van Krieken, Antonio Vergari, Stefano Teso, Andrea Passerini

Abstract

The advent of powerful neural classifiers has increased interest in problems that require both learning and reasoning. These problems are critical for understanding important properties of models, such as trustworthiness, generalization, interpretability, and compliance to safety and structural constraints. However, recent research observed that tasks requiring both learning and reasoning on background knowledge often suffer from reasoning shortcuts (RSs): predictors can solve the downstream reasoning task without associating the correct concepts to the high-dimensional data. To address this issue, we introduce rsbench, a comprehensive benchmark suite designed to systematically evaluate the impact of RSs on models by providing easy access to highly customizable tasks affected by RSs. Furthermore, rsbench implements common metrics for evaluating concept quality and introduces novel formal verification procedures for assessing the presence of RSs in learning tasks. Using rsbench, we highlight that obtaining high quality concepts in both purely neural and neuro-symbolic models is a far-from-solved problem. rsbench is available at: https://unitn-sml.github.io/rsbench.

A Neuro-Symbolic Benchmark Suite for Concept Quality and Reasoning Shortcuts

Abstract

Paper Structure (104 sections, 1 theorem, 23 equations, 4 figures, 33 tables)

This paper contains 104 sections, 1 theorem, 23 equations, 4 figures, 33 tables.

Introduction
Reasoning Shortcuts: Causes, Consequences, and Scope
The rsbench Benchmark Suite
Arithmetical Tasks and Data Sets
Logical Tasks and Data Sets
High-stakes Tasks and Data Sets
Metrics for Reasoning Shortcuts
Evaluating RSs and Concept Quality with rsbench
Discussion and Conclusion
Metrics: Additional Details
Model-level Metrics
Label and Concept Evaluation.
Concept Collapse.
The Impact of TCAV
Task-level Metrics
...and 89 more sections

Key Result

Theorem 1

Under A1 and A2, the number of models of the form $p_\theta ({\bm{\mathrm{c}}} \mid {\bm{\mathrm{x}}}) = \mathbbm{1}\!\left\{{\bm{\mathrm{c}}} = f_\theta({\bm{\mathrm{x}}})\right\}$, with $f_\theta = (\alpha \circ f^*)$, attaining maximum likelihood amounts to:

Figures (4)

Figure 1: Role of concepts in deep learning models. (a) NeSy architectures like DeepProbLog (DPL) and Logic Tensor Networks (LTN) map the input ${\bm{\mathrm{x}}}$ to concepts ${\bm{\mathrm{c}}}$ and reason over these according to prior knowledge to obtain a label ${\bm{\mathrm{y}}}$. (b) CBMs are similar, except the prediction is computed by a learned linear layer, making it easy to obtain concept-level explanations of all predictions. (c) Black-box neural networks infer a label ${\bm{\mathrm{y}}}$ directly from the input ${\bm{\mathrm{x}}}$; concepts ${\bm{\mathrm{c}}}$ can be extracted from their latent representation by applying techniques like TCAV kim2018interpretability. Lighting bolts indicate what variables are usually supervised.
Figure 2: This figure illustrates inference and training in regular NeSy architectures for one BDD-OIA example xu2020explainable. The input ${\bm{\mathrm{x}}}$ is a dashcam image. The model first extracts concepts ${\bm{\mathrm{c}}} = (c_{\tt grn}, c_{\tt red}, c_{\tt ped}) \in \{0, 1\}^3$ from the image using a neural backbone ( NN) and then uses a (differentiable) reasoning layer to infer a vector label ${\bm{\mathrm{y}}} = (y_{\tt go}, y_{\tt stop}, y_{\tt left}, y_{\tt right})$. While the model includes a neural component, the labels depend solely on the extracted concepts. The reasoning layer is aware of prior knowledge ${\sf K}$, which encodes constraints like "if a pedestrian or a red light is detected, the prediction must be stop."
Figure 3: MNLogic reasoning shortcut
Figure 4: Illustration of the sampling process of SDD-OIA

Theorems & Definitions (2)

Example 1
Theorem 1: Misspecification of NeSy models marconato2023not

A Neuro-Symbolic Benchmark Suite for Concept Quality and Reasoning Shortcuts

Abstract

A Neuro-Symbolic Benchmark Suite for Concept Quality and Reasoning Shortcuts

Authors

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (2)