Semantic Loss Functions for Neuro-Symbolic Structured Prediction

Kareem Ahmed; Stefano Teso; Paolo Morettin; Luca Di Liello; Pierfrancesco Ardino; Jacopo Gobbi; Yitao Liang; Eric Wang; Kai-Wei Chang; Andrea Passerini; Guy Van den Broeck

Semantic Loss Functions for Neuro-Symbolic Structured Prediction

Kareem Ahmed, Stefano Teso, Paolo Morettin, Luca Di Liello, Pierfrancesco Ardino, Jacopo Gobbi, Yitao Liang, Eric Wang, Kai-Wei Chang, Andrea Passerini, Guy Van den Broeck

TL;DR

This paper presents semantic loss as a principled, differentiable way to enforce symbolic structure in neural networks for structured output prediction. By compiling Boolean constraints into tractable logical circuits and optimizing over models that satisfy the constraints, the method performs probability-weighted model counting to compute loss; a companion neuro-symbolic entropy term further biases toward valid and confident predictions. The approach is extended to generative modeling through Constrained Adversarial Networks (CANs), enabling generation of valid, structured objects such as game levels and molecules. Empirical results across semi- and fully-supervised tasks show improved coherence and constraint satisfaction, while CANs demonstrate efficient generation of structurally valid objects with dynamic constraint switching. Overall, the framework offers a modular, scalable path to integrate rich symbolic knowledge into both discriminative and generative deep models with strong empirical gains.

Abstract

Structured output prediction problems are ubiquitous in machine learning. The prominent approach leverages neural networks as powerful feature extractors, otherwise assuming the independence of the outputs. These outputs, however, jointly encode an object, e.g. a path in a graph, and are therefore related through the structure underlying the output space. We discuss the semantic loss, which injects knowledge about such structure, defined symbolically, into training by minimizing the network's violation of such dependencies, steering the network towards predicting distributions satisfying the underlying structure. At the same time, it is agnostic to the arrangement of the symbols, and depends only on the semantics expressed thereby, while also enabling efficient end-to-end training and inference. We also discuss key improvements and applications of the semantic loss. One limitations of the semantic loss is that it does not exploit the association of every data point with certain features certifying its membership in a target class. We should therefore prefer minimum-entropy distributions over valid structures, which we obtain by additionally minimizing the neuro-symbolic entropy. We empirically demonstrate the benefits of this more refined formulation. Moreover, the semantic loss is designed to be modular and can be combined with both discriminative and generative neural models. This is illustrated by integrating it into generative adversarial networks, yielding constrained adversarial networks, a novel class of deep generative models able to efficiently synthesize complex objects obeying the structure of the underlying domain.

Semantic Loss Functions for Neuro-Symbolic Structured Prediction

TL;DR

Abstract

Paper Structure (29 sections, 3 theorems, 16 equations, 4 figures, 5 tables, 2 algorithms)

This paper contains 29 sections, 3 theorems, 16 equations, 4 figures, 5 tables, 2 algorithms.

Semantic Loss Functions for Neuro-Symbolic Structured Prediction
Introduction
Notation
Semantic Loss
Tractable Computation through Knowledge Compilation
Logical Circuits
Structural Properties
Neuro-Symbolic Entropy Regularization
Motivation and Definition
Defining the Loss
Computing the Loss
Base Case: $\alpha$ is a literal
Recursive Case: $\alpha$ is a conjunction
Recursive Case: $\alpha$ is a disjunction
An Illustrative example
...and 14 more sections

Key Result

Theorem 1

If $g$ and $d$ are non-parametric and the leftmost expectation in Eq. eq:gan is approximated arbitrarily well by the data, the global equilibrium $(g\xspace^*, d\xspace^*)$ of Eq. eq:gan satisfies $\mathop{\mathrm{P}}\nolimits_{d\xspace^*} \equiv \frac{1}{2}$ and $\mathop{\mathrm{P}}\nolimits_{g\xsp

Figures (4)

Figure 1: A network's predictive distribution can be uncertain or certain ($\leftrightarrow$), and it can allow or disallow invalid predictions under the constraint $\alpha$ ($\updownarrow$). Entropy regularization steers the network towards confident, possibly invalid predictions (b). Neuro-symbolic learning steers the network towards valid predictions without necessarily being confident (c). Neuro-symbolic entropy-regularization guides the network to valid and confident predictions (d).
Figure 2: For a given data point, the network (middle) outputs a distribution over classes $A, B$ and $C$, highlighted in blue, green and red, respectively. The circuit encodes the constraint $(A \land B) \implies C$. For each leaf node $l$, we plug in $\mathop{\mathrm{P}}\nolimits(l)$ and $1 - \mathop{\mathrm{P}}\nolimits(l)$ for positive and negative literals, respectively. The computation proceeds bottom-up, taking products at AND gates and summations at OR gates. The value accumulated at the root of the circuit (left) is the probability allocated by the network to the constraint. The weights accumulated on edges from OR gates to their children are of special significance: OR nodes induce a partitioning of the distribution's support, and the weights correspond to the mass allocated by the network to each mutually-exclusive event. Complemented with a second upward pass, where the entropy of an OR node is the entropy of the distribution over its children plus the expected entropy of its children, and the entropy of an AND node is the product of its children's entropies, we get the entropy of the distribution over the constraint's models -- the neuro-symbolic entropy regularization loss (right).
Figure 3: Warcraft dataset. Each input (left) is a $12 \times 12$ grid corresponding to a Warcraft II terrain map, the output is a matrix (middle) indicating the shortest path from top left to bottom right (right).
Figure 4: Examples of SMB levels generated by GAN and CAN. Left: generating levels containing pipes; right: generating reachable levels. For each of the two settings we report prototypical examples of levels generated by GAN (first and third picture) and CAN (second and fourth picture). Notice how all pipes generated by CAN are valid, contrarily to what happens for GAN, and that the GAN generates a level that is not playable (because of the big jump at the start of the map).

Theorems & Definitions (3)

Theorem 1: goodfellow2014generative
Corollary 1
Proposition 1

Semantic Loss Functions for Neuro-Symbolic Structured Prediction

TL;DR

Abstract

Semantic Loss Functions for Neuro-Symbolic Structured Prediction

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (4)

Theorems & Definitions (3)