Table of Contents
Fetching ...

RevOrder: A Novel Method for Enhanced Arithmetic in Language Models

Si Shen, Peijun Shen, Danhao Zhu

TL;DR

The paper introduces CSID, a metric for arithmetic equation difficulty, and RevOrder, a digit-reversal technique that keeps CSID at $\,\mathcal{O}(1)\,$ for addition, subtraction, and nD by 1D multiplication, enabling exact arithmetic with low token overhead. It demonstrates that LLMs struggle as CSID grows, and that RevOrder yields near-perfect or perfect accuracy across core tasks, including division, while drastically reducing training and inference costs. The authors validate RevOrder on synthetic arithmetic data and show substantial gains on GSM8K when used for fine-tuning, suggesting the potential for pretraining integration to further enhance arithmetic capability. Overall, RevOrder offers a practical, scalable approach to reliable arithmetic in LLMs, with clear implications for efficient reasoning and downstream math tasks.

Abstract

This paper presents RevOrder, a novel technique aimed at improving arithmetic operations in large language models (LLMs) by reversing the output digits in addition, subtraction, and n-digit by 1-digit (nD by 1D) multiplication tasks. Our method significantly reduces the Count of Sequential Intermediate Digits (CSID) to $\mathcal{O}(1)$, a new metric we introduce to assess equation complexity. Through comprehensive testing, RevOrder not only achieves perfect accuracy in basic arithmetic operations but also substantially boosts LLM performance in division tasks, particularly with large numbers where traditional models struggle. Implementation of RevOrder is cost-effective for both training and inference phases. Moreover, applying RevOrder to fine-tune the LLaMA2-7B model on the GSM8K math task results in a considerable improvement, reducing equation calculation errors by 46% and increasing overall scores from 41.6 to 44.4.

RevOrder: A Novel Method for Enhanced Arithmetic in Language Models

TL;DR

The paper introduces CSID, a metric for arithmetic equation difficulty, and RevOrder, a digit-reversal technique that keeps CSID at for addition, subtraction, and nD by 1D multiplication, enabling exact arithmetic with low token overhead. It demonstrates that LLMs struggle as CSID grows, and that RevOrder yields near-perfect or perfect accuracy across core tasks, including division, while drastically reducing training and inference costs. The authors validate RevOrder on synthetic arithmetic data and show substantial gains on GSM8K when used for fine-tuning, suggesting the potential for pretraining integration to further enhance arithmetic capability. Overall, RevOrder offers a practical, scalable approach to reliable arithmetic in LLMs, with clear implications for efficient reasoning and downstream math tasks.

Abstract

This paper presents RevOrder, a novel technique aimed at improving arithmetic operations in large language models (LLMs) by reversing the output digits in addition, subtraction, and n-digit by 1-digit (nD by 1D) multiplication tasks. Our method significantly reduces the Count of Sequential Intermediate Digits (CSID) to , a new metric we introduce to assess equation complexity. Through comprehensive testing, RevOrder not only achieves perfect accuracy in basic arithmetic operations but also substantially boosts LLM performance in division tasks, particularly with large numbers where traditional models struggle. Implementation of RevOrder is cost-effective for both training and inference phases. Moreover, applying RevOrder to fine-tune the LLaMA2-7B model on the GSM8K math task results in a considerable improvement, reducing equation calculation errors by 46% and increasing overall scores from 41.6 to 44.4.
Paper Structure (34 sections, 4 equations, 9 figures, 3 tables)

This paper contains 34 sections, 4 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: An illustration of performing addition using various methods. In the RevOrder method, the 'r|' symbol indicates that the subsequent digits are presented in reverse order.
  • Figure 2: LLM performance on equations with varying CSIDs.
  • Figure 3: An error example of division by RevOrder.
  • Figure 4: Analysis of the rollback ratio in division. (a) Test precision vs. rollback ratio for $12D \div 6D$ division. (b) Probability of rollbacks during testing across different digit sizes.
  • Figure 5: The number of extra tokens required for multiplication and division.
  • ...and 4 more figures