Algorithmic Language Models with Neurally Compiled Libraries
Lucas Saldyt, Subbarao Kambhampati
TL;DR
The paper tackles the problem that large language models struggle with true algorithmic reasoning due to optimization and architectural limitations. It proposes neural compilation of a differentiable library of fundamental operations into a transformer (LLaMA3.2) to enable native execution of algorithms with adaptive depth, memory, and a differentiable interpreter. Key contributions include a detailed differentiable memory/register machine architecture, a call-based program library, and experiments showing programmable arithmetic and limited sorting in small-scale models, along with insights on tokenization and representation challenges. The results indicate that differentiable computers can bootstrap algorithmic reasoning in LLMs and point to future work on parallel differentiable hardware and more robust compositionality for broader, reliable reasoning capabilities.
Abstract
Important tasks such as reasoning and planning are fundamentally algorithmic, meaning that solving them robustly requires acquiring true reasoning or planning algorithms, rather than shortcuts. Large Language Models lack true algorithmic ability primarily because of the limitations of neural network optimization algorithms, their optimization data and optimization objective, but also due to architectural inexpressivity. To solve this, our paper proposes augmenting LLMs with a library of fundamental operations and sophisticated differentiable programs, so that common algorithms do not need to be learned from scratch. We add memory, registers, basic operations, and adaptive recurrence to a transformer architecture built on LLaMA3. Then, we define a method for directly compiling algorithms into a differentiable starting library, which is used natively and propagates gradients for optimization. In this preliminary study, we explore the feasability of augmenting LLaMA3 with a differentiable computer, for instance by fine-tuning small transformers on simple algorithmic tasks with variable computational depth.
