Table of Contents
Fetching ...

On the Hardness of Probabilistic Neurosymbolic Learning

Jaron Maene, Vincent Derkinderen, Luc De Raedt

TL;DR

The paper tackles the gradient differentiation challenge in probabilistic neurosymbolic learning by linking gradient computation to Weighted Model Counting (WMC), proving intractability in general but tractability during training as networks become confident. It introduces WeightME, an unbiased gradient estimator based on weighted model sampling, with PAC-style guarantees and a logarithmic number of SAT calls, offering solid theoretical and practical benefits. The authors also critically evaluate biased WMS approaches, showing they struggle to optimize even when exact inference is feasible, and provide extensive experiments on CNF benchmarks to support these findings. Overall, the work highlights the need for principled, SAT-backed gradient estimators in neurosymbolic learning and outlines clear directions for extending these results to more expressive settings.

Abstract

The limitations of purely neural learning have sparked an interest in probabilistic neurosymbolic models, which combine neural networks with probabilistic logical reasoning. As these neurosymbolic models are trained with gradient descent, we study the complexity of differentiating probabilistic reasoning. We prove that although approximating these gradients is intractable in general, it becomes tractable during training. Furthermore, we introduce WeightME, an unbiased gradient estimator based on model sampling. Under mild assumptions, WeightME approximates the gradient with probabilistic guarantees using a logarithmic number of calls to a SAT solver. Lastly, we evaluate the necessity of these guarantees on the gradient. Our experiments indicate that the existing biased approximations indeed struggle to optimize even when exact solving is still feasible.

On the Hardness of Probabilistic Neurosymbolic Learning

TL;DR

The paper tackles the gradient differentiation challenge in probabilistic neurosymbolic learning by linking gradient computation to Weighted Model Counting (WMC), proving intractability in general but tractability during training as networks become confident. It introduces WeightME, an unbiased gradient estimator based on weighted model sampling, with PAC-style guarantees and a logarithmic number of SAT calls, offering solid theoretical and practical benefits. The authors also critically evaluate biased WMS approaches, showing they struggle to optimize even when exact inference is feasible, and provide extensive experiments on CNF benchmarks to support these findings. Overall, the work highlights the need for principled, SAT-backed gradient estimators in neurosymbolic learning and outlines clear directions for extending these results to more expressive settings.

Abstract

The limitations of purely neural learning have sparked an interest in probabilistic neurosymbolic models, which combine neural networks with probabilistic logical reasoning. As these neurosymbolic models are trained with gradient descent, we study the complexity of differentiating probabilistic reasoning. We prove that although approximating these gradients is intractable in general, it becomes tractable during training. Furthermore, we introduce WeightME, an unbiased gradient estimator based on model sampling. Under mild assumptions, WeightME approximates the gradient with probabilistic guarantees using a logarithmic number of calls to a SAT solver. Lastly, we evaluate the necessity of these guarantees on the gradient. Our experiments indicate that the existing biased approximations indeed struggle to optimize even when exact solving is still feasible.
Paper Structure (28 sections, 6 theorems, 22 equations, 5 figures, 2 tables)

This paper contains 28 sections, 6 theorems, 22 equations, 5 figures, 2 tables.

Key Result

Theorem 3.1

Computing the partial derivative of a WMC problem is reducible to WMC problems, and vice versa.

Figures (5)

  • Figure 1: When sampling from the distribution of interpretations, we need to hit a model to obtain a gradient. (left): At initialization, the distribution over interpretations is fairly uniform and the probability that interpretation sampling finds a model is vanishingly small. Model sampling avoids this by sampling directly from the models. (right): When the neural network becomes more confident in its predictions, it becomes easier to sample a model.
  • Figure 2: First epoch of training on 4-digit MNIST-addition with exact inference. We also plot the cosine similarity between the exact gradients and the sampled gradients from the SFE. Results are averaged over 10 seeds.
  • Figure 3: Cumulative runtimes on all the MCC instances. Omitted methods achieve similar performance to the Product t-norm.
  • Figure 4: Maximum log-likelihood achieved by the various biased gradient approximations, sorted from best to worst. The benchmarks are 33 easy instances from the Model Counting Competitions. Higher is better.
  • Figure 5: Maximum negative log-likelihood achieved by the various biased gradient approximations, sorted from best to worst. The benchmarks are 33 easy instances from the Model Counting Competition. Higher is better. All weights are initialized such that 90% of weights are already correct for a certain model.

Theorems & Definitions (21)

  • Definition 2.1: Weighted Model Count
  • Example 1
  • Theorem 3.1
  • proof
  • Definition 3.2
  • Theorem 3.3
  • proof
  • Theorem 4.1
  • proof
  • Definition 4.2
  • ...and 11 more