Table of Contents
Fetching ...

Softened Symbol Grounding for Neuro-symbolic Systems

Zenan Li, Yuan Yao, Taolue Chen, Jingwei Xu, Chun Cao, Xiaoxing Ma, Jian Lü

TL;DR

This work tackles the symbol grounding bottleneck in neuro-symbolic systems by proposing a softened grounding framework that models the input-symbol mapping as a Boltzmann distribution $Q_{\boldsymbol{\phi}}$ and gradually sharpens it through annealing. It introduces a projection-based MCMC sampling scheme, aided by SMT solvers, to efficiently explore the feasible symbol space $\mathcal{S}_{\mathbf{y}}$, and provides a convergence analysis for stochastic optimization with biased gradient estimates. A two-stage training protocol (annealing Stage I and zero-degree Stage II) is demonstrated across handwriting formula evaluation, visual Sudoku, and shortest-path tasks, showing superior performance and grounding efficiency versus state-of-the-art baselines. The results suggest that explicit, probabilistic symbol grounding coupled with targeted projection sampling can significantly enhance the integration of neural perception and symbolic reasoning, with broad implications for scalable neuro-symbolic learning and semi-supervised reasoning.

Abstract

Neuro-symbolic learning generally consists of two separated worlds, i.e., neural network training and symbolic constraint solving, whose success hinges on symbol grounding, a fundamental problem in AI. This paper presents a novel, softened symbol grounding process, bridging the gap between the two worlds, and resulting in an effective and efficient neuro-symbolic learning framework. Technically, the framework features (1) modeling of symbol solution states as a Boltzmann distribution, which avoids expensive state searching and facilitates mutually beneficial interactions between network training and symbolic reasoning;(2) a new MCMC technique leveraging projection and SMT solvers, which efficiently samples from disconnected symbol solution spaces; (3) an annealing mechanism that can escape from %being trapped into sub-optimal symbol groundings. Experiments with three representative neuro symbolic learning tasks demonstrate that, owining to its superior symbol grounding capability, our framework successfully solves problems well beyond the frontier of the existing proposals.

Softened Symbol Grounding for Neuro-symbolic Systems

TL;DR

This work tackles the symbol grounding bottleneck in neuro-symbolic systems by proposing a softened grounding framework that models the input-symbol mapping as a Boltzmann distribution and gradually sharpens it through annealing. It introduces a projection-based MCMC sampling scheme, aided by SMT solvers, to efficiently explore the feasible symbol space , and provides a convergence analysis for stochastic optimization with biased gradient estimates. A two-stage training protocol (annealing Stage I and zero-degree Stage II) is demonstrated across handwriting formula evaluation, visual Sudoku, and shortest-path tasks, showing superior performance and grounding efficiency versus state-of-the-art baselines. The results suggest that explicit, probabilistic symbol grounding coupled with targeted projection sampling can significantly enhance the integration of neural perception and symbolic reasoning, with broad implications for scalable neuro-symbolic learning and semi-supervised reasoning.

Abstract

Neuro-symbolic learning generally consists of two separated worlds, i.e., neural network training and symbolic constraint solving, whose success hinges on symbol grounding, a fundamental problem in AI. This paper presents a novel, softened symbol grounding process, bridging the gap between the two worlds, and resulting in an effective and efficient neuro-symbolic learning framework. Technically, the framework features (1) modeling of symbol solution states as a Boltzmann distribution, which avoids expensive state searching and facilitates mutually beneficial interactions between network training and symbolic reasoning;(2) a new MCMC technique leveraging projection and SMT solvers, which efficiently samples from disconnected symbol solution spaces; (3) an annealing mechanism that can escape from %being trapped into sub-optimal symbol groundings. Experiments with three representative neuro symbolic learning tasks demonstrate that, owining to its superior symbol grounding capability, our framework successfully solves problems well beyond the frontier of the existing proposals.
Paper Structure (18 sections, 2 theorems, 22 equations, 4 figures, 2 tables, 1 algorithm)

This paper contains 18 sections, 2 theorems, 22 equations, 4 figures, 2 tables, 1 algorithm.

Key Result

Proposition 1

Assume the loss function $\ell(\bm{\theta})$ is $L$-Lipschitz and $\ell$-smooth, and let the actual sampling distribution be $\widehat{Q}$. Then, if the total variation distance $d_{\text{tv}}(\widehat{Q}, Q^*)$ is bounded by $\epsilon$, it holds after $K$ steps of the stochastic gradient descent wi where $\Delta_0 = \ell(\bm{\theta}_0) - \min \ell(\bm{\theta})$, and $n$ is the cardinal number of

Figures (4)

  • Figure 1: An example neural-symbolic system for handwritten formula evaluation. It takes a handwritten arithmetic expression $\mathbf{x}$ as input and evaluate the expression to output $\mathbf{y}$. The neural network component $\mathcal{M}_{\bm{\theta}}$ recognizes the symbols $\mathbf{z}$ (i.e., digits and operators) in the expression, and the symbolic component evaluates the recognized formula by, e.g., the Python function 'eval'. The challenge in training $\mathcal{M}_{\bm{\theta}}$ comes from the lack of explicit $\mathbf{z}$ to bridge the gap between the neural world ($\mathbf{x}$ to $\mathbf{z}$) and the symbol world ($\mathbf{z}$ to $\mathbf{y}$). Through softened symbol grounding, the model training and the constraint satisfaction join force to resolve the latent $\mathbf{z}$ to fit both the given $\mathbf{x}$ and $\mathbf{y}$.
  • Figure 2: Accuracy (%) of the HWF task. Our methods (i.e., Stage I+II) perform much better than comparison methods.
  • Figure 3: Accuracy (%) of the SDSP task. Our methods are better than competitors and close to the direct supervision case.
  • Figure 4: Training curves (the first row) and test curves (the second row) of different approaches. We only plot the curves for some of the methods for brevity. Our approaches (Log and Linear) achieve the best symbol accuracy on the training set, and also generalize better to the test set.

Theorems & Definitions (4)

  • Proposition 1
  • Proposition 2
  • proof
  • proof