Table of Contents
Fetching ...

Mastering Symbolic Operations: Augmenting Language Models with Compiled Neural Networks

Yixuan Weng, Minjun Zhu, Fei Xia, Bin Li, Shizhu He, Kang Liu, Jun Zhao

TL;DR

This work tackles the limited symbolic reasoning and length generalization of language models by introducing Neural Comprehension, a framework that injects Compiled Neural Networks (CoNNs) into Transformer architectures to encode rules explicitly. CoNNs, coupled with an end-to-end gating mechanism, enable deterministic, rule-based execution while preserving the LM's implicit learning capabilities; an explicit gradient modification further strengthens rule learning within in-context learning. The authors also provide AutoCoNN, a toolkit that automatically generates CoNNs from symbolic instructions and examples, enabling scalable coverage of symbolic tasks. Empirically, Neural Comprehension delivers superior length generalization, efficiency, and interpretability on symbolic operations and arithmetic reasoning across model scales, often rivaling or surpassing tool-based approaches while maintaining end-to-end differentiability. This approach unifies explicit rule learning with implicit pattern learning, offering a significant step toward true symbolic comprehension in language models and broad applicability across tasks requiring deterministic symbol manipulation.

Abstract

Language models' (LMs) proficiency in handling deterministic symbolic reasoning and rule-based tasks remains limited due to their dependency implicit learning on textual data. To endow LMs with genuine rule comprehension abilities, we propose "Neural Comprehension" - a framework that synergistically integrates compiled neural networks (CoNNs) into the standard transformer architecture. CoNNs are neural modules designed to explicitly encode rules through artificially generated attention weights. By incorporating CoNN modules, the Neural Comprehension framework enables LMs to accurately and robustly execute rule-intensive symbolic tasks. Extensive experiments demonstrate the superiority of our approach over existing techniques in terms of length generalization, efficiency, and interpretability for symbolic operations. Furthermore, it can be applied to LMs across different model scales, outperforming tool-calling methods in arithmetic reasoning tasks while maintaining superior inference efficiency. Our work highlights the potential of seamlessly unifying explicit rule learning via CoNNs and implicit pattern learning in LMs, paving the way for true symbolic comprehension capabilities.

Mastering Symbolic Operations: Augmenting Language Models with Compiled Neural Networks

TL;DR

This work tackles the limited symbolic reasoning and length generalization of language models by introducing Neural Comprehension, a framework that injects Compiled Neural Networks (CoNNs) into Transformer architectures to encode rules explicitly. CoNNs, coupled with an end-to-end gating mechanism, enable deterministic, rule-based execution while preserving the LM's implicit learning capabilities; an explicit gradient modification further strengthens rule learning within in-context learning. The authors also provide AutoCoNN, a toolkit that automatically generates CoNNs from symbolic instructions and examples, enabling scalable coverage of symbolic tasks. Empirically, Neural Comprehension delivers superior length generalization, efficiency, and interpretability on symbolic operations and arithmetic reasoning across model scales, often rivaling or surpassing tool-based approaches while maintaining end-to-end differentiability. This approach unifies explicit rule learning with implicit pattern learning, offering a significant step toward true symbolic comprehension in language models and broad applicability across tasks requiring deterministic symbol manipulation.

Abstract

Language models' (LMs) proficiency in handling deterministic symbolic reasoning and rule-based tasks remains limited due to their dependency implicit learning on textual data. To endow LMs with genuine rule comprehension abilities, we propose "Neural Comprehension" - a framework that synergistically integrates compiled neural networks (CoNNs) into the standard transformer architecture. CoNNs are neural modules designed to explicitly encode rules through artificially generated attention weights. By incorporating CoNN modules, the Neural Comprehension framework enables LMs to accurately and robustly execute rule-intensive symbolic tasks. Extensive experiments demonstrate the superiority of our approach over existing techniques in terms of length generalization, efficiency, and interpretability for symbolic operations. Furthermore, it can be applied to LMs across different model scales, outperforming tool-calling methods in arithmetic reasoning tasks while maintaining superior inference efficiency. Our work highlights the potential of seamlessly unifying explicit rule learning via CoNNs and implicit pattern learning in LMs, paving the way for true symbolic comprehension capabilities.
Paper Structure (42 sections, 5 equations, 12 figures, 18 tables)

This paper contains 42 sections, 5 equations, 12 figures, 18 tables.

Figures (12)

  • Figure 1: The length generalization of T5 (with fine-tune), GPT-3.5 and GPT-4 (with few-shot) on symbolic operations (Addition) tasks. To evaluate the model's proficiency, we conducted experiments on tasks ranging from 3 to 30 digits, with longer than 10 digits being out-of-distribution of training data.
  • Figure 2: Demonstration of the principles of Parity CoNN.
  • Figure 3: The architecture of the proposed Neural Comprehension framework.
  • Figure 4: Comparison of Neural Comprehension and other implicit learning-based methods in symbolic operations tasks to test length generalization performance. In this, the T5 model uses the Vanilla Fine-tune method for learning, and LLMs use the Few-shot learning method. In Neural Comprehension, each task has a different CoNN, namely Parity, Reverse, Addition, and Subtraction.
  • Figure 5: In the iterative process of gradient descent during training. The bleu line represents a language model that incorporates neural comprehension, and the red line represents the original language model. Additionally, we provide Direct, which is a direct prediction of the final result, as a reference.
  • ...and 7 more figures