Table of Contents
Fetching ...

IGC: Integrating a Gated Calculator into an LLM to Solve Arithmetic Tasks Reliably and Efficiently

Florian Dietz, Dietrich Klakow

TL;DR

This work introduces the Integrated Gated Calculator (IGC), a module that embeds a GPU-emulated, non-differentiable calculator inside a pretrained LLM to solve arithmetic tasks in a single pass without external tools or chains-of-thought. The IGC comprises an Input Mapping, a discrete calculator, and an Output Mapping with gated connections, trained with an auxiliary loss and anchor-timed execution, and it achieves near-perfect accuracy on the BigBench Arithmetic benchmark while being significantly more efficient than prompting-based or tool-use methods. The approach demonstrates strong generalization, high efficiency, and interpretability, with ablation results supporting its advantages over purely finetuned baselines and COT-like methods. The authors discuss integration into pretraining, extension to other non-differentiable operations, and the potential for broader applicability beyond arithmetic tasks.

Abstract

Solving arithmetic tasks is a simple and fundamental skill, yet modern Large Language Models (LLMs) have great difficulty with them. We introduce the Integrated Gated Calculator (IGC), a module that enables LLMs to perform arithmetic by emulating a calculator on the GPU. We finetune a Llama model with our module and test it on the BigBench Arithmetic benchmark, where it beats the State of the Art, outperforming all models on the benchmark, including models almost two orders of magnitude larger. Our approach takes only a single iteration to run and requires no external tools. It performs arithmetic operations entirely inside the LLM without the need to produce intermediate tokens. It is computationally efficient, interpretable, and avoids side-effects on tasks that do not require arithmetic operations. It reliably achieves 98\% to 99\% accuracy across multiple training runs and for all subtasks, including the substantially harder subtask of multiplication, which was previously unsolved.

IGC: Integrating a Gated Calculator into an LLM to Solve Arithmetic Tasks Reliably and Efficiently

TL;DR

This work introduces the Integrated Gated Calculator (IGC), a module that embeds a GPU-emulated, non-differentiable calculator inside a pretrained LLM to solve arithmetic tasks in a single pass without external tools or chains-of-thought. The IGC comprises an Input Mapping, a discrete calculator, and an Output Mapping with gated connections, trained with an auxiliary loss and anchor-timed execution, and it achieves near-perfect accuracy on the BigBench Arithmetic benchmark while being significantly more efficient than prompting-based or tool-use methods. The approach demonstrates strong generalization, high efficiency, and interpretability, with ablation results supporting its advantages over purely finetuned baselines and COT-like methods. The authors discuss integration into pretraining, extension to other non-differentiable operations, and the potential for broader applicability beyond arithmetic tasks.

Abstract

Solving arithmetic tasks is a simple and fundamental skill, yet modern Large Language Models (LLMs) have great difficulty with them. We introduce the Integrated Gated Calculator (IGC), a module that enables LLMs to perform arithmetic by emulating a calculator on the GPU. We finetune a Llama model with our module and test it on the BigBench Arithmetic benchmark, where it beats the State of the Art, outperforming all models on the benchmark, including models almost two orders of magnitude larger. Our approach takes only a single iteration to run and requires no external tools. It performs arithmetic operations entirely inside the LLM without the need to produce intermediate tokens. It is computationally efficient, interpretable, and avoids side-effects on tasks that do not require arithmetic operations. It reliably achieves 98\% to 99\% accuracy across multiple training runs and for all subtasks, including the substantially harder subtask of multiplication, which was previously unsolved.
Paper Structure (17 sections, 4 figures, 3 tables)

This paper contains 17 sections, 4 figures, 3 tables.

Figures (4)

  • Figure 1: Examples of arithmetic tasks.
  • Figure 2: Left. The IGC is inserted into a pretrained LLM after a fixed layer, in this case layer 1. It modifies the output produced by that layer. Right. During training, the IGC takes the latent activations produced by the layer as its inputs and splits them into two parts: Before and after the anchor token $T_t$ at time step $t$, which has a special role for argument selection. The IGC comprises three components, two of which are trainable submodules: The Input Mapping submodule (Figure \ref{['fig:architecture-details']}, left) uses the tokens before $T_t$ to extract the arithmetic task from the text and to format it for the calculator. It is trained through an auxiliary loss. The calculator itself is emulated on the GPU through a sequence of non-differentiable tensor operations. It is not a trainable component. The Output Mapping submodule (Figure \ref{['fig:architecture-details']}, right) uses the results of the calculator to modify the tokens after $T_t$. It is trained by the LLM's normal loss function. Note that this image shows the training process using teacher forcing. During inference, the Input Mapping and the calculator are executed only on the iteration when the anchor token arrives. Their outputs are cached and reused on subsequent iterations.
  • Figure 3: Left. The Input Mapping submodule takes variable-length textual embeddings and extracts the numbers and operator as fixed-length categorical data. The operands and operator are produced as probability distributions over possible digit values for each digit. Calculator (not shown). The calculator discretizes the distributions produced by the Input Mapping submodule by sampling the most probable number and operator. It then emulates the arithmetic operation. The resulting number is formatted using one-hot encoding. Right. The Output Mapping submodule uses the fixed-length output of the calculator to modify each of the output tokens. This uses a separate learned gating weight for each token so that it can easily learn to leave tokens unchanged.
  • Figure 4: The accuracy of various architectures on the BigBench Arithmetic benchmark as training proceeds. Multiple lines with the same color correspond to different random seeds for the same architecture.