Table of Contents
Fetching ...

Learning Mathematical Rules with Large Language Models

Antoine Gorceix, Bastien Le Chenadec, Ahmad Rammal, Nelson Vadori, Manuela Veloso

TL;DR

The ability of large language models to learn specific mathematical rules such as distributivity or simplifying equations, as well as to reuse them in the context of word problems is studied.

Abstract

In this paper, we study the ability of large language models to learn specific mathematical rules such as distributivity or simplifying equations. We present an empirical analysis of their ability to generalize these rules, as well as to reuse them in the context of word problems. For this purpose, we provide a rigorous methodology to build synthetic data incorporating such rules, and perform fine-tuning of large language models on such data. Our experiments show that our model can learn and generalize these rules to some extent, as well as suitably reuse them in the context of word problems.

Learning Mathematical Rules with Large Language Models

TL;DR

The ability of large language models to learn specific mathematical rules such as distributivity or simplifying equations, as well as to reuse them in the context of word problems is studied.

Abstract

In this paper, we study the ability of large language models to learn specific mathematical rules such as distributivity or simplifying equations. We present an empirical analysis of their ability to generalize these rules, as well as to reuse them in the context of word problems. For this purpose, we provide a rigorous methodology to build synthetic data incorporating such rules, and perform fine-tuning of large language models on such data. Our experiments show that our model can learn and generalize these rules to some extent, as well as suitably reuse them in the context of word problems.

Paper Structure

This paper contains 27 sections, 20 equations, 18 figures, 18 tables.

Figures (18)

  • Figure 1: Word problem example - quadratic polynomial.
  • Figure 2: Word problem example - resistor circuit.
  • Figure 3: Word problem example - fruit baskets.
  • Figure 4: Validation accuracy on the distributivity rule for different vocabulary sizes. Each model is evaluated on the $(100-x)\%$ complement of its training vocabulary $x\%$ ($\%$ of tokenizer's vocabulary). The dashed lines delimit the parameters seen during training from those unseen. From left to right, from top to bottom: $x=1$, $10$, $50$, $75$, $95$.
  • Figure 5: Detailed resolution of a system of equations by recursive call of our model on each step $i \to i+1$. The variables are $dog,sky$.
  • ...and 13 more figures