Learning with Logical Constraints but without Shortcut Satisfaction

Zenan Li; Zehua Liu; Yuan Yao; Jingwei Xu; Taolue Chen; Xiaoxing Ma; Jian Lü

Learning with Logical Constraints but without Shortcut Satisfaction

Zenan Li, Zehua Liu, Yuan Yao, Jingwei Xu, Taolue Chen, Xiaoxing Ma, Jian Lü

TL;DR

This paper addresses the shortcut satisfaction issue by introducing dual variables for logical connectives, and proposes a variational framework where the encoded logical constraint is expressed as a distributional loss that is compatible with the model's original training loss.

Abstract

Recent studies in neuro-symbolic learning have explored the integration of logical knowledge into deep learning via encoding logical constraints as an additional loss function. However, existing approaches tend to vacuously satisfy logical constraints through shortcuts, failing to fully exploit the knowledge. In this paper, we present a new framework for learning with logical constraints. Specifically, we address the shortcut satisfaction issue by introducing dual variables for logical connectives, encoding how the constraint is satisfied. We further propose a variational framework where the encoded logical constraint is expressed as a distributional loss that is compatible with the model's original training loss. The theoretical analysis shows that the proposed approach bears salient properties, and the experimental evaluations demonstrate its superior performance in both model generalizability and constraint satisfaction.

Learning with Logical Constraints but without Shortcut Satisfaction

TL;DR

Abstract

Paper Structure (32 sections, 11 theorems, 68 equations, 3 figures, 6 tables, 1 algorithm)

This paper contains 32 sections, 11 theorems, 68 equations, 3 figures, 6 tables, 1 algorithm.

Introduction
Logic to Loss Function Translation
Logical Constraints
Logical Constraint Translation
Advantages of Our Translation
A Variational Learning Framework
Distributional Loss for Logical Constraints
Optimization Procedure
Experiments and Results
Handwritten Digit Recognition
Handwritten Formula Recognition
Shortest Distance Prediction
Image Classification
Related Work
Conclusion
...and 17 more sections

Key Result

Theorem 1

Given the logical formula $\alpha = \wedge_{i \in \mathcal{I}} \vee_{j \in \mathcal{J}} v_{ij} \leq c_{ij}$, if the dual variables $\{\mu_i, i \in \mathcal{I}\}$ and $\{\nu_{ij}, j \in \mathcal{J}\}$ of $S_{\alpha}(v)$ converge to $\{\mu_i^*, i \in \mathcal{I}\}$ and $\{\nu_{ij}^*, j \in \mathcal{J

Figures (3)

Figure 1: Consider a semi-supervised classification task of handwritten digit recognition. For the illustration purpose, we remove the labels of training images in class '6', but introduce a logical rule $P:=(f(R({\mathbf{x}})) = 9) \rightarrow Q:= (f(\mathbf{x}) = 6)$ to predict '6', where $R({\mathbf{x}})$ stands for rotating the image $\mathbf{x}$ by 180$^{\circ}$. The ideal satisfying assignments should be $(P,Q)=(\textbf{T},\textbf{T})$ for class '6'. However, existing methods (e.g., DL2 DL22019) tend to vacuously satisfy the rule by discouraging the satisfaction of $P$ for all inputs, including those actually in class '6'. In contrast, our approach successfully learns to satisfy $Q$ when $P$ holds for class '6', even achieving comparable accuracy (98.8%) to the fully supervised setting.
Figure 2: The accuracy results (%) of image classification on the CIFAR100 dataset. The proposed approach outperforms the competitors in all the three cases for both class and superclass classification.
Figure 3: The learning curves of different settings. Our algorithm with logical constraints (the red curves) significantly boosts the training efficiency compared to the plain case (the blue curves).

Theorems & Definitions (17)

Theorem 1
Theorem 2
Theorem 3
Proposition 1
proof
Proposition 2
proof
proof
Proposition 3
Proposition 4
...and 7 more

Learning with Logical Constraints but without Shortcut Satisfaction

TL;DR

Abstract

Learning with Logical Constraints but without Shortcut Satisfaction

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (17)