Table of Contents
Fetching ...

Symbol Correctness in Deep Neural Networks Containing Symbolic Layers

Aaron Bembenek, Toby Murray

TL;DR

This work formalizes symbol correctness as the alignment between neural-to-symbol predictions at the neural-symbolic boundary and the ground-truth symbolic representation, arguing that it is essential for explainability and transfer learning in NS-DNNs. It develops a formal NS-DNN model ⟨f_θ, g, p⟩, defines symbol and output correctness, and proves that symbol correctness cannot be guaranteed to be trainable from output supervision alone. To study training dynamics, the authors introduce an ideal synthesizer and three practical synthesizers (Autodiff, Closest, Random) within a unified training framework, and analyze how they reconcile neural beliefs with symbolic possibilities. Through experiments on a visual addition task with Datalog-based symbolic layers, they show that high output accuracy does not guarantee symbol correctness, and that synthesizer choice critically shapes symbol learning, especially under data distribution shifts. The results motivate future NS-DNN training strategies that leverage symbol correctness to improve explainability and transferability, including curriculum approaches, biased data, and partially labeled symbols.

Abstract

To handle AI tasks that combine perception and logical reasoning, recent work introduces Neurosymbolic Deep Neural Networks (NS-DNNs), which contain -- in addition to traditional neural layers -- symbolic layers: symbolic expressions (e.g., SAT formulas, logic programs) that are evaluated by symbolic solvers during inference. We identify and formalize an intuitive, high-level principle that can guide the design and analysis of NS-DNNs: symbol correctness, the correctness of the intermediate symbols predicted by the neural layers with respect to a (generally unknown) ground-truth symbolic representation of the input data. We demonstrate that symbol correctness is a necessary property for NS-DNN explainability and transfer learning (despite being in general impossible to train for). Moreover, we show that the framework of symbol correctness provides a precise way to reason and communicate about model behavior at neural-symbolic boundaries, and gives insight into the fundamental tradeoffs faced by NS-DNN training algorithms. In doing so, we both identify significant points of ambiguity in prior work, and provide a framework to support further NS-DNN developments.

Symbol Correctness in Deep Neural Networks Containing Symbolic Layers

TL;DR

This work formalizes symbol correctness as the alignment between neural-to-symbol predictions at the neural-symbolic boundary and the ground-truth symbolic representation, arguing that it is essential for explainability and transfer learning in NS-DNNs. It develops a formal NS-DNN model ⟨f_θ, g, p⟩, defines symbol and output correctness, and proves that symbol correctness cannot be guaranteed to be trainable from output supervision alone. To study training dynamics, the authors introduce an ideal synthesizer and three practical synthesizers (Autodiff, Closest, Random) within a unified training framework, and analyze how they reconcile neural beliefs with symbolic possibilities. Through experiments on a visual addition task with Datalog-based symbolic layers, they show that high output accuracy does not guarantee symbol correctness, and that synthesizer choice critically shapes symbol learning, especially under data distribution shifts. The results motivate future NS-DNN training strategies that leverage symbol correctness to improve explainability and transferability, including curriculum approaches, biased data, and partially labeled symbols.

Abstract

To handle AI tasks that combine perception and logical reasoning, recent work introduces Neurosymbolic Deep Neural Networks (NS-DNNs), which contain -- in addition to traditional neural layers -- symbolic layers: symbolic expressions (e.g., SAT formulas, logic programs) that are evaluated by symbolic solvers during inference. We identify and formalize an intuitive, high-level principle that can guide the design and analysis of NS-DNNs: symbol correctness, the correctness of the intermediate symbols predicted by the neural layers with respect to a (generally unknown) ground-truth symbolic representation of the input data. We demonstrate that symbol correctness is a necessary property for NS-DNN explainability and transfer learning (despite being in general impossible to train for). Moreover, we show that the framework of symbol correctness provides a precise way to reason and communicate about model behavior at neural-symbolic boundaries, and gives insight into the fundamental tradeoffs faced by NS-DNN training algorithms. In doing so, we both identify significant points of ambiguity in prior work, and provide a framework to support further NS-DNN developments.
Paper Structure (40 sections, 4 equations, 7 figures, 2 tables)

This paper contains 40 sections, 4 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: This NS-DNN for visual addition predicts the sum of two handwritten digits. Given images of handwritten digits, neural layers predict the digits' values. At the neural-symbolic boundary, these predictions are instantiated as symbolic digits, which are bound to variables $a$ and $b$ in the symbolic layer $a + b = c$. A solver is used to evaluate the symbolic layer, assigning the value 8 to the output variable $c$. The NS-DNN is symbol correct on the example input: the predicted symbols 3 and 5 match the (unprovided) ground-truth symbols corresponding to the handwritten digits. If instead the NS-DNN predicted the symbolic digits to be a pair of 4's, it would still be output correct (since 4 + 4 = 3 + 5), but not symbol correct.
  • Figure 2: Training on an input $x$, a synthesizer$\Sigma(g, p)$ is responsible for building loss gradients $G_z$ from the current neural predictions $z \triangleq f_\theta(x)$, training label $y$, grounding function $g$, and symbolic program $p$.
  • Figure 3: Each synthesizer can learn a model with much higher output accuracy than symbol accuracy. These heatmaps show the mapping from handwritten digits to mathematical digits in the most extreme model (i.e., having the largest gap between output and symbol accuracy) learned by each synthesizer; we report output accuracy vs symbol accuracy in parentheses.
  • Figure 4: These empirical cumulative density plots show the proportion of cases for each synthesizer that achieved a given output and symbol accuracy in Experiment #1. Closest demonstrates the highest variability, while Multiple is bimodal, with a small number of cases of no symbol accuracy (mean: 0.00) and middling output accuracy (mean: 0.48). The results for Random appear as a vertical line as all cases achieved roughly the same high output and symbol accuracy (mean: 0.98).
  • Figure 5: In Experiment #1, Closest learned models with four varieties of symbol confusion (Table \ref{['tab:max_sat']}), three of which are on display here (Class A is displayed in Figure \ref{['fig:max_sat']}). For the heatmaps, we chose an (arbitrary) representative model for each class. Class B models incorrectly map three digits; Class C models, two digits; and Class D models, one digit.
  • ...and 2 more figures

Theorems & Definitions (5)

  • Definition 3.1: NS-DNN
  • Definition 3.2: Output correct
  • Definition 3.3: Symbol correct
  • Example 1: Xor
  • Example 2: Datalog-based visual addition