Convex and Bilevel Optimization for Neuro-Symbolic Inference and Learning

Charles Dickens; Changyu Gao; Connor Pryor; Stephen Wright; Lise Getoor

Convex and Bilevel Optimization for Neuro-Symbolic Inference and Learning

Charles Dickens, Changyu Gao, Connor Pryor, Stephen Wright, Lise Getoor

TL;DR

This work develops a principled gradient-based learning framework for neural-symbolic systems by casting NeSy learning as a bilevel optimization problem and smoothing the lower-level energy with the Moreau envelope. It introduces a smooth LCQP formulation for NeuPSL inference and a dual-BCD method that exploits warm starts and parallelization to achieve substantial runtime speedups (up to $100\times$) while enabling explicit gradient computation with respect to both neural and symbolic weights. Empirical results across eight datasets show improvements in both learning efficiency and predictive performance, including up to $16$ percentage-point gains on MNIST-Add and competitive gains on standard HL-MRF benchmarks. Overall, the framework provides a scalable, end-to-end trainable pathway for integrating neural perception with symbolic reasoning in NeSy systems, with broad applicability to NeuPSL and related NeSy models.

Abstract

We leverage convex and bilevel optimization techniques to develop a general gradient-based parameter learning framework for neural-symbolic (NeSy) systems. We demonstrate our framework with NeuPSL, a state-of-the-art NeSy architecture. To achieve this, we propose a smooth primal and dual formulation of NeuPSL inference and show learning gradients are functions of the optimal dual variables. Additionally, we develop a dual block coordinate descent algorithm for the new formulation that naturally exploits warm-starts. This leads to over 100x learning runtime improvements over the current best NeuPSL inference method. Finally, we provide extensive empirical evaluations across 8 datasets covering a range of tasks and demonstrate our learning framework achieves up to a 16% point prediction performance improvement over alternative learning methods.

Convex and Bilevel Optimization for Neuro-Symbolic Inference and Learning

TL;DR

) while enabling explicit gradient computation with respect to both neural and symbolic weights. Empirical results across eight datasets show improvements in both learning efficiency and predictive performance, including up to

percentage-point gains on MNIST-Add and competitive gains on standard HL-MRF benchmarks. Overall, the framework provides a scalable, end-to-end trainable pathway for integrating neural perception with symbolic reasoning in NeSy systems, with broad applicability to NeuPSL and related NeSy models.

Abstract

Paper Structure (38 sections, 9 theorems, 70 equations, 3 figures, 14 tables, 3 algorithms)

This paper contains 38 sections, 9 theorems, 70 equations, 3 figures, 14 tables, 3 algorithms.

Introduction
Related Work
NeSy Energy-Based Models
A Bilevel NeSy Learning Framework
Deep Hinge-loss Markov Random Fields
A smooth formulation of inference
Continuity of inference
Dual block coordinate descent
Empirical evaluation
Inference runtime
Learning runtime
Learning prediction performance
Limitations
Conclusions and future work
Appendix
...and 23 more sections

Key Result

Theorem 5.2

Suppose for any setting of $\mathbf{w}_{nn} \in \mathbb{R}^{n_g}$ there is a feasible solution to NeuPSL inference eq:regularized_lcqp_primal. Further, suppose $\epsilon > 0$, $\mathbf{w}_{sy} \in \mathbb{R}_{+}^{r}$, and $\mathbf{w}_{nn} \in \mathbb{R}^{n_g}$. Then:

Figures (3)

Figure 1: Example of MNIST-Add1 and MNIST-Add2.
Figure 2: Summarized NeuPSL MNIST-Add1 Symbolic Model. The full model is available at: https://github.com/convexbilevelnesylearning/experimentscripts/mnist_addition/neupsl_models.
Figure 3: Summarized NeuPSL MNIST-Add2 Symbolic Model. The full model is available at: https://github.com/convexbilevelnesylearning/experimentscripts/mnist_addition/neupsl_models.

Theorems & Definitions (15)

Definition 5.1
Theorem 5.2
Theorem 4.1: boyd:book04 p. 81
Theorem 4.2: boyd:book04 p. 81
Definition 4.3: Convex Subgradient: boyd:book04 and shalevshwartz:ftml11
Definition 4.4: Closedness: bertsekas:book09
Definition 4.5: Lower Semicontinuity: bertsekas:book09
Theorem 4.6: Closedness and Semicontinuity: bertsekas:book09 Proposition 1.1.2.
Definition 4.7: Regular Subgradient: rockafellar:book97 Definition 8.3
Theorem 4.8: Chain Rule for Regular Subgradients: rockafellar:book97 Theorem 10.6
...and 5 more

Convex and Bilevel Optimization for Neuro-Symbolic Inference and Learning

TL;DR

Abstract

Convex and Bilevel Optimization for Neuro-Symbolic Inference and Learning

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (15)