Evaluating Transformer Language Models on Arithmetic Operations Using Number Decomposition
Matteo Muffo, Aldo Cocco, Enrico Bertino
TL;DR
This work addresses arithmetic reasoning in transformer language models by introducing a digit-decomposition pipeline that guides computations across units, tens, and higher magnitudes. A GPT-2 model fine-tuned with this pipeline (Calculon) demonstrates strong generalization to unseen numbers, achieving high accuracy on additions and subtractions for up to five digits, while a baseline model without decomposition fails dramatically. The study also compares to a spaced-digit approach and to GPT-3 in few-shot settings, finding that decomposition helps in GPT-2 but not in GPT-3 few-shot prompts, and that multiplication remains the hardest operation. The findings suggest that structural representations of numbers can significantly enhance neural arithmetic capabilities, with implications for improving numeracy in NLP systems and guiding future investigations into higher-digit arithmetic and other transformer architectures.
Abstract
In recent years, Large Language Models such as GPT-3 showed remarkable capabilities in performing NLP tasks in the zero and few shot settings. On the other hand, the experiments highlighted the difficulty of GPT-3 in carrying out tasks that require a certain degree of reasoning, such as arithmetic operations. In this paper we evaluate the ability of Transformer Language Models to perform arithmetic operations following a pipeline that, before performing computations, decomposes numbers in units, tens, and so on. We denote the models fine-tuned with this pipeline with the name Calculon and we test them in the task of performing additions, subtractions and multiplications on the same test sets of GPT-3. Results show an increase of accuracy of 63% in the five-digit addition task. Moreover, we demonstrate the importance of the decomposition pipeline introduced, since fine-tuning the same Language Model without decomposing numbers results in 0% accuracy in the five-digit addition task.
