Symbol Correctness in Deep Neural Networks Containing Symbolic Layers
Aaron Bembenek, Toby Murray
TL;DR
This work formalizes symbol correctness as the alignment between neural-to-symbol predictions at the neural-symbolic boundary and the ground-truth symbolic representation, arguing that it is essential for explainability and transfer learning in NS-DNNs. It develops a formal NS-DNN model ⟨f_θ, g, p⟩, defines symbol and output correctness, and proves that symbol correctness cannot be guaranteed to be trainable from output supervision alone. To study training dynamics, the authors introduce an ideal synthesizer and three practical synthesizers (Autodiff, Closest, Random) within a unified training framework, and analyze how they reconcile neural beliefs with symbolic possibilities. Through experiments on a visual addition task with Datalog-based symbolic layers, they show that high output accuracy does not guarantee symbol correctness, and that synthesizer choice critically shapes symbol learning, especially under data distribution shifts. The results motivate future NS-DNN training strategies that leverage symbol correctness to improve explainability and transferability, including curriculum approaches, biased data, and partially labeled symbols.
Abstract
To handle AI tasks that combine perception and logical reasoning, recent work introduces Neurosymbolic Deep Neural Networks (NS-DNNs), which contain -- in addition to traditional neural layers -- symbolic layers: symbolic expressions (e.g., SAT formulas, logic programs) that are evaluated by symbolic solvers during inference. We identify and formalize an intuitive, high-level principle that can guide the design and analysis of NS-DNNs: symbol correctness, the correctness of the intermediate symbols predicted by the neural layers with respect to a (generally unknown) ground-truth symbolic representation of the input data. We demonstrate that symbol correctness is a necessary property for NS-DNN explainability and transfer learning (despite being in general impossible to train for). Moreover, we show that the framework of symbol correctness provides a precise way to reason and communicate about model behavior at neural-symbolic boundaries, and gives insight into the fundamental tradeoffs faced by NS-DNN training algorithms. In doing so, we both identify significant points of ambiguity in prior work, and provide a framework to support further NS-DNN developments.
