Table of Contents
Fetching ...

Learning with Logical Constraints but without Shortcut Satisfaction

Zenan Li, Zehua Liu, Yuan Yao, Jingwei Xu, Taolue Chen, Xiaoxing Ma, Jian Lü

TL;DR

This paper addresses the shortcut satisfaction issue by introducing dual variables for logical connectives, and proposes a variational framework where the encoded logical constraint is expressed as a distributional loss that is compatible with the model's original training loss.

Abstract

Recent studies in neuro-symbolic learning have explored the integration of logical knowledge into deep learning via encoding logical constraints as an additional loss function. However, existing approaches tend to vacuously satisfy logical constraints through shortcuts, failing to fully exploit the knowledge. In this paper, we present a new framework for learning with logical constraints. Specifically, we address the shortcut satisfaction issue by introducing dual variables for logical connectives, encoding how the constraint is satisfied. We further propose a variational framework where the encoded logical constraint is expressed as a distributional loss that is compatible with the model's original training loss. The theoretical analysis shows that the proposed approach bears salient properties, and the experimental evaluations demonstrate its superior performance in both model generalizability and constraint satisfaction.

Learning with Logical Constraints but without Shortcut Satisfaction

TL;DR

This paper addresses the shortcut satisfaction issue by introducing dual variables for logical connectives, and proposes a variational framework where the encoded logical constraint is expressed as a distributional loss that is compatible with the model's original training loss.

Abstract

Recent studies in neuro-symbolic learning have explored the integration of logical knowledge into deep learning via encoding logical constraints as an additional loss function. However, existing approaches tend to vacuously satisfy logical constraints through shortcuts, failing to fully exploit the knowledge. In this paper, we present a new framework for learning with logical constraints. Specifically, we address the shortcut satisfaction issue by introducing dual variables for logical connectives, encoding how the constraint is satisfied. We further propose a variational framework where the encoded logical constraint is expressed as a distributional loss that is compatible with the model's original training loss. The theoretical analysis shows that the proposed approach bears salient properties, and the experimental evaluations demonstrate its superior performance in both model generalizability and constraint satisfaction.
Paper Structure (32 sections, 11 theorems, 68 equations, 3 figures, 6 tables, 1 algorithm)

This paper contains 32 sections, 11 theorems, 68 equations, 3 figures, 6 tables, 1 algorithm.

Key Result

Theorem 1

Given the logical formula $\alpha = \wedge_{i \in \mathcal{I}} \vee_{j \in \mathcal{J}} v_{ij} \leq c_{ij}$, if the dual variables $\{\mu_i, i \in \mathcal{I}\}$ and $\{\nu_{ij}, j \in \mathcal{J}\}$ of $S_{\alpha}(v)$ converge to $\{\mu_i^*, i \in \mathcal{I}\}$ and $\{\nu_{ij}^*, j \in \mathcal{J

Figures (3)

  • Figure 1: Consider a semi-supervised classification task of handwritten digit recognition. For the illustration purpose, we remove the labels of training images in class '6', but introduce a logical rule $P:=(f(R({\mathbf{x}})) = 9) \rightarrow Q:= (f(\mathbf{x}) = 6)$ to predict '6', where $R({\mathbf{x}})$ stands for rotating the image $\mathbf{x}$ by 180$^{\circ}$. The ideal satisfying assignments should be $(P,Q)=(\textbf{T},\textbf{T})$ for class '6'. However, existing methods (e.g., DL2 DL22019) tend to vacuously satisfy the rule by discouraging the satisfaction of $P$ for all inputs, including those actually in class '6'. In contrast, our approach successfully learns to satisfy $Q$ when $P$ holds for class '6', even achieving comparable accuracy (98.8%) to the fully supervised setting.
  • Figure 2: The accuracy results (%) of image classification on the CIFAR100 dataset. The proposed approach outperforms the competitors in all the three cases for both class and superclass classification.
  • Figure 3: The learning curves of different settings. Our algorithm with logical constraints (the red curves) significantly boosts the training efficiency compared to the plain case (the blue curves).

Theorems & Definitions (17)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Proposition 1
  • proof
  • Proposition 2
  • proof
  • proof
  • Proposition 3
  • Proposition 4
  • ...and 7 more