Transformer-Based Models Are Not Yet Perfect At Learning to Emulate Structural Recursion
Dylan Zhang, Curt Tigges, Zory Zhang, Stella Biderman, Maxim Raginsky, Talia Ringer
TL;DR
This paper addresses whether transformer-based sequence models can learn structural recursion, a fundamental yet challenging form of computation. It proposes a general framework that couples inductive, sequential encodings of recursive data types with dual semantic lenses: a stepwise δ-β-ι reduction and an Abstract State Machine (ASM) perspective to analyze learned behavior. Empirically, it demonstrates that small transformers trained on I/O pairs tend to memorize shortcuts rather than true recursion, with edge cases and in-context demonstrations exposing brittleness; pre-trained models show mixed results depending on architecture and data, while oversampling edge cases can mitigate some failures. The findings illuminate why current neural approaches struggle with structural recursion and outline a principled path—through syntax-semantics coupling and ASM-guided analysis—for designing more reliable recursive reasoning in neural systems. Overall, the work advances understanding of how to represent and assess recursion in sequence models and highlights practical considerations for data, prompts, and model design to improve robustness in recursive tasks.
Abstract
This paper investigates the ability of transformer-based models to learn structural recursion from examples. Recursion is a universal concept in both natural and formal languages. Structural recursion is central to the programming language and formal mathematics tasks where symbolic tools currently excel beyond neural models, such as inferring semantic relations between datatypes and emulating program behavior. We introduce a general framework that nicely connects the abstract concepts of structural recursion in the programming language domain to concrete sequence modeling problems and learned models' behavior. The framework includes a representation that captures the general \textit{syntax} of structural recursion, coupled with two different frameworks for understanding their \textit{semantics} -- one that is more natural from a programming languages perspective and one that helps bridge that perspective with a mechanistic understanding of the underlying transformer architecture. With our framework as a powerful conceptual tool, we identify different issues under various set-ups. The models trained to emulate recursive computations cannot fully capture the recursion yet instead fit short-cut algorithms and thus cannot solve certain edge cases that are under-represented in the training distribution. In addition, it is difficult for state-of-the-art large language models (LLMs) to mine recursive rules from in-context demonstrations. Meanwhile, these LLMs fail in interesting ways when emulating reduction (step-wise computation) of the recursive function.
