Solving for X and Beyond: Can Large Language Models Solve Complex Math Problems with More-Than-Two Unknowns?

Kuei-Chun Kao; Ruochen Wang; Cho-Jui Hsieh

Solving for X and Beyond: Can Large Language Models Solve Complex Math Problems with More-Than-Two Unknowns?

Kuei-Chun Kao, Ruochen Wang, Cho-Jui Hsieh

TL;DR

The Formulate-and-Solve strategy, a generalized prompting approach that effectively handles problems with an arbitrary number of unknowns, is proposed, and is revealed to enhance LLM performance on the BeyondX benchmark but also provides deeper insights into the computational limits of LLMs when faced with more complex mathematical challenges.

Abstract

Large Language Models (LLMs) have demonstrated remarkable performance in solving math problems, a hallmark of human intelligence. Despite high success rates on current benchmarks; however, these often feature simple problems with only one or two unknowns, which do not sufficiently challenge their reasoning capacities. This paper introduces a novel benchmark, BeyondX, designed to address these limitations by incorporating problems with multiple unknowns. Recognizing the challenges in proposing multi-unknown problems from scratch, we developed BeyondX using an innovative automated pipeline that progressively increases complexity by expanding the number of unknowns in simpler problems. Empirical study on BeyondX reveals that the performance of existing LLMs, even those fine-tuned specifically on math tasks, significantly decreases as the number of unknowns increases - with a performance drop of up to 70\% observed in GPT-4. To tackle these challenges, we propose the Formulate-and-Solve strategy, a generalized prompting approach that effectively handles problems with an arbitrary number of unknowns. Our findings reveal that this strategy not only enhances LLM performance on the BeyondX benchmark but also provides deeper insights into the computational limits of LLMs when faced with more complex mathematical challenges.

Solving for X and Beyond: Can Large Language Models Solve Complex Math Problems with More-Than-Two Unknowns?

TL;DR

Abstract

Paper Structure (52 sections, 6 figures, 21 tables, 1 algorithm)

This paper contains 52 sections, 6 figures, 21 tables, 1 algorithm.

Introduction
C1: BeyondX - The first multi-unknown algebraic benchmark.
C2: Existing LLMs struggles with increasing unknowns.
C3: Formulate-and-Solve A prompting method to tackle multi-unknown problems.
Related Work
Math Word Problem Generation
Math Word Problem Solver
Math Reasoning with LLMs
Automatic Generation of Multi-Unknown Algebra Problems via Progressive Expansion
Challenges for Constructing Multi-Unknown Datasets
Generating new problems with LLMs.
Limitations of naive generation.
Generating New Problems via Progress Expansion
Pipeline overview.
Multi-step problem expansion.
...and 37 more sections

Figures (6)

Figure 1: An example question of multi-unknown algebra problem generation and its corresponding reasoning steps. The prompts used for each step can be found Appendix \ref{['tab:mwp_generation_instruction']}.
Figure 2: Preliminary study of different LLMs and prompting methods on multi-unknown algebra datasets.
Figure 3: The overview of Automatic Solver of Algebra Problems.
Figure 4: The performance of different existing open-source models.
Figure 5: The performance on different numbers of unknown.
...and 1 more figures

Solving for X and Beyond: Can Large Language Models Solve Complex Math Problems with More-Than-Two Unknowns?

TL;DR

Abstract

Solving for X and Beyond: Can Large Language Models Solve Complex Math Problems with More-Than-Two Unknowns?

Authors

TL;DR

Abstract

Table of Contents

Figures (6)