Table of Contents
Fetching ...

Enhancing Mathematical Reasoning in LLMs with Background Operators

Jiajun Chen, Yik-Cheung Tam

TL;DR

This work introduces MATH-Prolog, a corpus of Prolog-based solutions for competition-level counting and probability problems, built from a standardized set of 54 background operators and problem-specific predicates. It leverages a cross-validated self-training framework to iteratively generate and verify diverse Prolog solutions during fine-tuning of large language models, achieving high accuracy on unseen problems. Key findings show that 5-fold cross-validated self-training yields 84.6% accuracy on the cross-validated set and 84.8% on the test set after augmentation, with background operators in prompts enhancing solution coverage. The approach advances computable, verifiable mathematical reasoning in LLMs and suggests potential for broader domain generalization with structured predicate graphs.

Abstract

We propose utilizing background operators for mathematical reasoning in large language models (LLMs). To achieve this, we define a set of fundamental mathematical predicates as the basic building blocks. For each mathematical problem, we develop a Prolog solution that includes problem-specific predicates and intermediate predicates derived from these background operators, ensuring that each solution adheres to the defined operator set. We introduce the MATH-Prolog corpus, which is derived from the counting and probability categories of the MATH corpus. For efficient data augmentation, we apply K-fold cross-validated self-training. This method incrementally generates new Prolog solutions for each fold, incorporating those verified as correct into the training set throughout the model training process. Our experimental results demonstrate that 5-fold crossvalidated self-training effectively identifies new, accurate Prolog solutions, achieving an accuracy of 84.6% on the cross-validated set, and 84.8% on the test set during fine-tuning the Meta-Llama-3.1-8B-Instruct model. This approach successfully uncovers new solutions with fully computable inference steps for previously unseen problems. Additionally, incorporating the background mathematical predicates into the prompt enhances solution coverage.

Enhancing Mathematical Reasoning in LLMs with Background Operators

TL;DR

This work introduces MATH-Prolog, a corpus of Prolog-based solutions for competition-level counting and probability problems, built from a standardized set of 54 background operators and problem-specific predicates. It leverages a cross-validated self-training framework to iteratively generate and verify diverse Prolog solutions during fine-tuning of large language models, achieving high accuracy on unseen problems. Key findings show that 5-fold cross-validated self-training yields 84.6% accuracy on the cross-validated set and 84.8% on the test set after augmentation, with background operators in prompts enhancing solution coverage. The approach advances computable, verifiable mathematical reasoning in LLMs and suggests potential for broader domain generalization with structured predicate graphs.

Abstract

We propose utilizing background operators for mathematical reasoning in large language models (LLMs). To achieve this, we define a set of fundamental mathematical predicates as the basic building blocks. For each mathematical problem, we develop a Prolog solution that includes problem-specific predicates and intermediate predicates derived from these background operators, ensuring that each solution adheres to the defined operator set. We introduce the MATH-Prolog corpus, which is derived from the counting and probability categories of the MATH corpus. For efficient data augmentation, we apply K-fold cross-validated self-training. This method incrementally generates new Prolog solutions for each fold, incorporating those verified as correct into the training set throughout the model training process. Our experimental results demonstrate that 5-fold crossvalidated self-training effectively identifies new, accurate Prolog solutions, achieving an accuracy of 84.6% on the cross-validated set, and 84.8% on the test set during fine-tuning the Meta-Llama-3.1-8B-Instruct model. This approach successfully uncovers new solutions with fully computable inference steps for previously unseen problems. Additionally, incorporating the background mathematical predicates into the prompt enhances solution coverage.

Paper Structure

This paper contains 24 sections, 1 equation, 4 figures, 3 tables, 1 algorithm.

Figures (4)

  • Figure 1: Computation graph of a Prolog solution for the problem "What is the sum of all integer values $n$ for which $\binom{26}{13}+\binom{26}{n}=\binom{27}{14}$?".
  • Figure 2: A Prolog code solution for the problem "If two numbers are randomly chosen without replacement from $\{1, 2, 3, 4, 5\}$, what is the probability their sum is greater than their product?".
  • Figure 3: Accuracy for 5-fold cross-validated self-training.
  • Figure 4: Diversity between original solution and augmented solution.