Table of Contents
Fetching ...

Convex and Bilevel Optimization for Neuro-Symbolic Inference and Learning

Charles Dickens, Changyu Gao, Connor Pryor, Stephen Wright, Lise Getoor

TL;DR

This work develops a principled gradient-based learning framework for neural-symbolic systems by casting NeSy learning as a bilevel optimization problem and smoothing the lower-level energy with the Moreau envelope. It introduces a smooth LCQP formulation for NeuPSL inference and a dual-BCD method that exploits warm starts and parallelization to achieve substantial runtime speedups (up to $100\times$) while enabling explicit gradient computation with respect to both neural and symbolic weights. Empirical results across eight datasets show improvements in both learning efficiency and predictive performance, including up to $16$ percentage-point gains on MNIST-Add and competitive gains on standard HL-MRF benchmarks. Overall, the framework provides a scalable, end-to-end trainable pathway for integrating neural perception with symbolic reasoning in NeSy systems, with broad applicability to NeuPSL and related NeSy models.

Abstract

We leverage convex and bilevel optimization techniques to develop a general gradient-based parameter learning framework for neural-symbolic (NeSy) systems. We demonstrate our framework with NeuPSL, a state-of-the-art NeSy architecture. To achieve this, we propose a smooth primal and dual formulation of NeuPSL inference and show learning gradients are functions of the optimal dual variables. Additionally, we develop a dual block coordinate descent algorithm for the new formulation that naturally exploits warm-starts. This leads to over 100x learning runtime improvements over the current best NeuPSL inference method. Finally, we provide extensive empirical evaluations across 8 datasets covering a range of tasks and demonstrate our learning framework achieves up to a 16% point prediction performance improvement over alternative learning methods.

Convex and Bilevel Optimization for Neuro-Symbolic Inference and Learning

TL;DR

This work develops a principled gradient-based learning framework for neural-symbolic systems by casting NeSy learning as a bilevel optimization problem and smoothing the lower-level energy with the Moreau envelope. It introduces a smooth LCQP formulation for NeuPSL inference and a dual-BCD method that exploits warm starts and parallelization to achieve substantial runtime speedups (up to ) while enabling explicit gradient computation with respect to both neural and symbolic weights. Empirical results across eight datasets show improvements in both learning efficiency and predictive performance, including up to percentage-point gains on MNIST-Add and competitive gains on standard HL-MRF benchmarks. Overall, the framework provides a scalable, end-to-end trainable pathway for integrating neural perception with symbolic reasoning in NeSy systems, with broad applicability to NeuPSL and related NeSy models.

Abstract

We leverage convex and bilevel optimization techniques to develop a general gradient-based parameter learning framework for neural-symbolic (NeSy) systems. We demonstrate our framework with NeuPSL, a state-of-the-art NeSy architecture. To achieve this, we propose a smooth primal and dual formulation of NeuPSL inference and show learning gradients are functions of the optimal dual variables. Additionally, we develop a dual block coordinate descent algorithm for the new formulation that naturally exploits warm-starts. This leads to over 100x learning runtime improvements over the current best NeuPSL inference method. Finally, we provide extensive empirical evaluations across 8 datasets covering a range of tasks and demonstrate our learning framework achieves up to a 16% point prediction performance improvement over alternative learning methods.
Paper Structure (38 sections, 9 theorems, 70 equations, 3 figures, 14 tables, 3 algorithms)

This paper contains 38 sections, 9 theorems, 70 equations, 3 figures, 14 tables, 3 algorithms.

Key Result

Theorem 5.2

Suppose for any setting of $\mathbf{w}_{nn} \in \mathbb{R}^{n_g}$ there is a feasible solution to NeuPSL inference eq:regularized_lcqp_primal. Further, suppose $\epsilon > 0$, $\mathbf{w}_{sy} \in \mathbb{R}_{+}^{r}$, and $\mathbf{w}_{nn} \in \mathbb{R}^{n_g}$. Then:

Figures (3)

  • Figure 1: Example of MNIST-Add1 and MNIST-Add2.
  • Figure 2: Summarized NeuPSL MNIST-Add1 Symbolic Model. The full model is available at: https://github.com/convexbilevelnesylearning/experimentscripts/mnist_addition/neupsl_models.
  • Figure 3: Summarized NeuPSL MNIST-Add2 Symbolic Model. The full model is available at: https://github.com/convexbilevelnesylearning/experimentscripts/mnist_addition/neupsl_models.

Theorems & Definitions (15)

  • Definition 5.1
  • Theorem 5.2
  • Theorem 4.1: boyd:book04 p. 81
  • Theorem 4.2: boyd:book04 p. 81
  • Definition 4.3: Convex Subgradient: boyd:book04 and shalevshwartz:ftml11
  • Definition 4.4: Closedness: bertsekas:book09
  • Definition 4.5: Lower Semicontinuity: bertsekas:book09
  • Theorem 4.6: Closedness and Semicontinuity: bertsekas:book09 Proposition 1.1.2.
  • Definition 4.7: Regular Subgradient: rockafellar:book97 Definition 8.3
  • Theorem 4.8: Chain Rule for Regular Subgradients: rockafellar:book97 Theorem 10.6
  • ...and 5 more