Stepwise Self-Consistent Mathematical Reasoning with Large Language Models

Zilong Zhao; Yao Rong; Dongyang Guo; Emek Gözlüklü; Emir Gülboy; Enkelejda Kasneci

Stepwise Self-Consistent Mathematical Reasoning with Large Language Models

Zilong Zhao, Yao Rong, Dongyang Guo, Emek Gözlüklü, Emir Gülboy, Enkelejda Kasneci

TL;DR

This work addresses the challenge of complex mathematical reasoning by introducing Stepwise Self-Consistent Chain-of-Thought (SSC-CoT), which discovers critical intermediate steps via the intersection of multiple reasoning chains and augments them with a domain knowledge graph. A new TriMaster100 trig dataset, featuring scored intermediate steps across 100 questions, provides a fine-grained benchmark for reasoning quality beyond final answers, complemented by evaluation on MATH level 5. SSC-CoT demonstrates substantial performance gains over state-of-the-art baselines, including a 34% improvement on TriMaster100 and a 7.2% margin on MATH level 5, driven by effective intermediate-step selection, KG-based information retrieval, and verification. The work includes comprehensive ablations, qualitative analyses, and release of code and TriMaster100 to foster further research in robust, interpretable mathematical reasoning with LLMs.

Abstract

Using Large Language Models for complex mathematical reasoning is difficult, primarily due to the complexity of multi-step reasoning. The main challenges of this process include (1) selecting critical intermediate results to advance the procedure, and (2) limited exploration of potential solutions. To address these issues, we introduce a novel algorithm, namely Stepwise Self-Consistent Chain-of-Thought (SSC-CoT). SSC-CoT employs a strategy of selecting intermediate steps based on the intersection of various reasoning chains. Additionally, SSC-CoT enables the model to discover critical intermediate steps by querying a knowledge graph comprising relevant domain knowledge. To validate SSC-CoT, we present a new dataset, TriMaster100, tailored for complex trigonometry problems. This dataset contains 100 questions, with each solution broken down into scored intermediate steps, facilitating a comprehensive evaluation of the mathematical reasoning process. On TriMaster100, SSC-CoT triples the effectiveness of the state-of-the-art methods. Furthermore, we benchmark SSC-CoT on the widely recognized complex mathematical question dataset, MATH level 5, and it surpasses the second-best method by 7.2% in accuracy. Code and the TriMaster100 dataset can be found at: https://github.com/zhao-zilong/ssc-cot.

Stepwise Self-Consistent Mathematical Reasoning with Large Language Models

TL;DR

Abstract

Paper Structure (35 sections, 13 equations, 6 figures, 6 tables, 1 algorithm)

This paper contains 35 sections, 13 equations, 6 figures, 6 tables, 1 algorithm.

Introduction
Related Work
LLMs for Mathematical Reasoning.
Retrieval Augmentation for Mathematical Reasoning
TriMaster100 Dataset
Dataset Construction
Human-Level Performance
Stepwise Self-Consistent Chain of Thought
SSC-CoT Workflow
Knowledge Graph Design and Exploration
Design.
Information Retrieval.
Intermediate Result Selection
Experiment
Experiment Setup
...and 20 more sections

Figures (6)

Figure 1: Our SSC-CoT (Right) improves the ability of LLMs (Left) to solve complex mathematical questions.
Figure 2: Example of the annotated intermediate steps with its scores in TriMaster100.
Figure 3: An example of Stepwise Self-Consistent Chain-of-Thought workflow.
Figure 4: A subset of knowledge graph for trigonometry.
Figure 5: Chains of thought with more than one group of overlapping intermediate result scenarios. (a) Two overlapping intermediate result groups, overlapping nodes in different chains. (b) Two overlapping intermediate result groups, nodes from different group appear in one chain. (c) Two overlapping intermediate result groups, nodes from different group appear in two chains. (d) More than two overlapping intermediate result groups.
...and 1 more figures

Stepwise Self-Consistent Mathematical Reasoning with Large Language Models

TL;DR

Abstract

Stepwise Self-Consistent Mathematical Reasoning with Large Language Models

Authors

TL;DR

Abstract

Table of Contents

Figures (6)