Table of Contents
Fetching ...

On the Independence Assumption in Neurosymbolic Learning

Emile van Krieken, Pasquale Minervini, Edoardo M. Ponti, Antonio Vergari

TL;DR

This work exposes how the standard conditional independence assumption in neurosymbolic learning biases models toward deterministic inferences and impedes uncertainty quantification. It develops a rigorous framework based on prime implicants and cubical sets to exactly characterize the set of feasible independent distributions and the topology of semantic-loss minima, showing non-convexity and disconnection in general. By contrasting with fully expressive distributions, it demonstrates that independence can be overcome via expressive parameterisations, conditioning analyses, or mixtures, which restore representational capacity and enable more favorable optimization landscapes. The findings guide the design of more expressive neurosymbolic probabilistic models and motivate further study of tractable inference, continuous-variable constraints, and topology-informed regularisation to calibrate uncertainty.

Abstract

State-of-the-art neurosymbolic learning systems use probabilistic reasoning to guide neural networks towards predictions that conform to logical constraints over symbols. Many such systems assume that the probabilities of the considered symbols are conditionally independent given the input to simplify learning and reasoning. We study and criticise this assumption, highlighting how it can hinder optimisation and prevent uncertainty quantification. We prove that loss functions bias conditionally independent neural networks to become overconfident in their predictions. As a result, they are unable to represent uncertainty over multiple valid options. Furthermore, we prove that these loss functions are difficult to optimise: they are non-convex, and their minima are usually highly disconnected. Our theoretical analysis gives the foundation for replacing the conditional independence assumption and designing more expressive neurosymbolic probabilistic models.

On the Independence Assumption in Neurosymbolic Learning

TL;DR

This work exposes how the standard conditional independence assumption in neurosymbolic learning biases models toward deterministic inferences and impedes uncertainty quantification. It develops a rigorous framework based on prime implicants and cubical sets to exactly characterize the set of feasible independent distributions and the topology of semantic-loss minima, showing non-convexity and disconnection in general. By contrasting with fully expressive distributions, it demonstrates that independence can be overcome via expressive parameterisations, conditioning analyses, or mixtures, which restore representational capacity and enable more favorable optimization landscapes. The findings guide the design of more expressive neurosymbolic probabilistic models and motivate further study of tractable inference, continuous-variable constraints, and topology-informed regularisation to calibrate uncertainty.

Abstract

State-of-the-art neurosymbolic learning systems use probabilistic reasoning to guide neural networks towards predictions that conform to logical constraints over symbols. Many such systems assume that the probabilities of the considered symbols are conditionally independent given the input to simplify learning and reasoning. We study and criticise this assumption, highlighting how it can hinder optimisation and prevent uncertainty quantification. We prove that loss functions bias conditionally independent neural networks to become overconfident in their predictions. As a result, they are unable to represent uncertainty over multiple valid options. Furthermore, we prove that these loss functions are difficult to optimise: they are non-convex, and their minima are usually highly disconnected. Our theoretical analysis gives the foundation for replacing the conditional independence assumption and designing more expressive neurosymbolic probabilistic models.
Paper Structure (27 sections, 13 theorems, 14 equations, 9 figures, 1 table)

This paper contains 27 sections, 13 theorems, 14 equations, 9 figures, 1 table.

Key Result

theorem 4

An independent distribution $p_{\boldsymbol{\theta}}({\mathbf{w}})$ minimises the semantic loss if and only if it is deterministic for some variables, and those variables form an implicant of ${\varphi}$.

Figures (9)

  • Figure 1: The conditional independence assumption discards valid and potentially meaningful solutions. The tetrahedron (a 3-dimensional probability simplex) represents the distributions over the options of the problem in Example \ref{['ex:traffic-lights']}: $r$ refers to the red light and $g$ to the green light. The green triangle represents distributions that assign zero probability to $r\wedge g$. The blue lines are the distributions in the green triangle that an independent distribution can represent. The left (resp. right) blue line represents the distributions where the probability of $r$ (resp. $g$) is zero. Independent distributions cannot represent distributions in the dotted green line, such as $p_2$ that assigns equal probability to only the green or only the red light being on. minima immoralia
  • Figure 2: The loss landscape of the semantic loss for the traffic light problem -- brighter (resp. darker) regions correspond to higher (resp. lower) semantic loss values.
  • Figure 3: The minimisation of the semantic loss on the traffic light problem for independent distributions (left) and expressive distributions (right). The initial distributions have impossible beliefs with $p(r, g)=0.7$, plotted in the top-left triangle and the top triangle within the tetrahedron. The resulting minima with $p(r, g)=0$ are in the bottom triangle. Minima of the independent assumption are as predicted by Theorem \ref{['prop:union']}. The minima of the expressive parameterisation cover differing areas in the bottom triangle, but are close to the vertices.
  • Figure 4: Plots of neurosymbolic loss functions for the formula $\neg r \vee \neg g$ using several t-norms. Left: Product t-norm, computed as $-\log p_{\boldsymbol{\theta}}({\varphi}|{\mathbf{x}})$. This coincides with the semantic loss. Center: The Gödel t-conorm $1 - \max(1-r, 1-g)$. Right: The Łukasiewicz t-conorm $1 - \min(1, 2 - r-g)$.
  • Figure 5: The full 3-simplex over possible worlds and the set of possible independent distributions in blue for the formula discussed in Section \ref{['appendix:minimal-cover']}. The $\mathcal{P}_{\phi}$ labels denote the set of distributions characterised by the implicant $\phi$, as defined in Proposition \ref{['prop:union']}.
  • ...and 4 more figures

Theorems & Definitions (36)

  • Example 1
  • Example 2: Learning with algorithms
  • Example 3: Semi-supervised learning with constraints
  • theorem 4: Implicants determine minima, informal
  • theorem 5: Convexity, informal
  • theorem 6: Connectedness, informal
  • Definition 7: Deterministic assignments
  • Definition 8: Implicants
  • theorem 9: Implicants determine possible independent distributions
  • proof
  • ...and 26 more