Table of Contents
Fetching ...

On Improving Neurosymbolic Learning by Exploiting the Representation Space

Aaditya Naik, Efthymia Tsamoura, Shibo Jin, Mayur Naik, Dan Roth

TL;DR

This work tackles the challenge of learning in neurosymbolic settings where input labels must satisfy a logical target, which can yield an explosive number of candidate pre-images. It introduces CLIPPER, an ILP-based method that builds a proximity graph over latent representations to prune inconsistent pre-images while guaranteeing at least one valid pre-image per training sample. The approach is plug-and-play with existing NSL engines and supports both frozen and trainable encoders, delivering substantial accuracy gains across a wide suite of benchmarks and tasks, including visual reasoning and video-to-text alignment. By coupling representation learning with a principled pruning mechanism, CLIPPER achieves robust, encoder-agnostic improvements and offers a scalable pathway to tighter supervision in neurosymbolic systems.

Abstract

We study the problem of learning neural classifiers in a neurosymbolic setting where the hidden gold labels of input instances must satisfy a logical formula. Learning in this setting proceeds by first computing (a subset of) the possible combinations of labels that satisfy the formula and then computing a loss using those combinations and the classifiers' scores. One challenge is that the space of label combinations can grow exponentially, making learning difficult. We propose a technique that prunes this space by exploiting the intuition that instances with similar latent representations are likely to share the same label. While this intuition has been widely used in weakly supervised learning, its application in our setting is challenging due to label dependencies imposed by logical constraints. We formulate the pruning process as an integer linear program that discards inconsistent label combinations while respecting logical structure. Our approach, CLIPPER, is orthogonal to existing training algorithms and can be seamlessly integrated with them. Across 16 benchmarks over complex neurosymbolic tasks, we demonstrate that CLIPPER boosts the performance of state-of-the-art neurosymbolic engines like Scallop, Dolphin, and ISED by up to 48%, 53%, and 8%, leading to state-of-the-art accuracies.

On Improving Neurosymbolic Learning by Exploiting the Representation Space

TL;DR

This work tackles the challenge of learning in neurosymbolic settings where input labels must satisfy a logical target, which can yield an explosive number of candidate pre-images. It introduces CLIPPER, an ILP-based method that builds a proximity graph over latent representations to prune inconsistent pre-images while guaranteeing at least one valid pre-image per training sample. The approach is plug-and-play with existing NSL engines and supports both frozen and trainable encoders, delivering substantial accuracy gains across a wide suite of benchmarks and tasks, including visual reasoning and video-to-text alignment. By coupling representation learning with a principled pruning mechanism, CLIPPER achieves robust, encoder-agnostic improvements and offers a scalable pathway to tighter supervision in neurosymbolic systems.

Abstract

We study the problem of learning neural classifiers in a neurosymbolic setting where the hidden gold labels of input instances must satisfy a logical formula. Learning in this setting proceeds by first computing (a subset of) the possible combinations of labels that satisfy the formula and then computing a loss using those combinations and the classifiers' scores. One challenge is that the space of label combinations can grow exponentially, making learning difficult. We propose a technique that prunes this space by exploiting the intuition that instances with similar latent representations are likely to share the same label. While this intuition has been widely used in weakly supervised learning, its application in our setting is challenging due to label dependencies imposed by logical constraints. We formulate the pruning process as an integer linear program that discards inconsistent label combinations while respecting logical structure. Our approach, CLIPPER, is orthogonal to existing training algorithms and can be seamlessly integrated with them. Across 16 benchmarks over complex neurosymbolic tasks, we demonstrate that CLIPPER boosts the performance of state-of-the-art neurosymbolic engines like Scallop, Dolphin, and ISED by up to 48%, 53%, and 8%, leading to state-of-the-art accuracies.
Paper Structure (12 sections, 2 theorems, 2 equations, 7 tables, 1 algorithm)

This paper contains 12 sections, 2 theorems, 2 equations, 7 tables, 1 algorithm.

Key Result

Proposition 3.6

[Optimality] The solution to eq:lattice-graph-ilp is the optimal solution of Problem problem.

Theorems & Definitions (11)

  • Example 1.1: NeSy example
  • Example 1.2
  • Example 2.1
  • Definition 3.1: Proximity graphs
  • Definition 3.2: Consistency
  • Example 3.3: Contd Example \ref{['example:NSL3']}
  • Definition 3.4: Pruning
  • Example 3.5: Contd Example \ref{['example:NSL4']}
  • Proposition 3.6
  • Proposition 2.0
  • ...and 1 more