Table of Contents
Fetching ...

AbstRaL: Augmenting LLMs' Reasoning by Reinforcing Abstract Thinking

Silin Gao, Antoine Bosselut, Samy Bengio, Emmanuel Abbe

TL;DR

AbstRaL addresses GSM reasoning robustness by learning abstract reasoning patterns $A$ through reinforcement learning on Granularly-decomposed Abstract Reasoning (GranulAR) data. The framework first abstracts problems into $X^{\mathcal{A}}$, $\mathcal{C}$ and $Y^{\mathcal{A}}$, retrieves the de-contextualized abstraction $\mathcal{A}$, and uses a symbolic solver to derive the final answer, guided by model-free rewards $r_{correct}$ and $r_{symbolic}$ within GRPO. Empirically, AbstRaL improves GSM reasoning robustness against instantiation and distractor perturbations across multiple seeds and models, and shows zero-shot improvements on a wide range of OOD mathematical and general reasoning tasks, suggesting that abstract thinking can broadly enhance generalizability. The work introduces GranulAR data and a fine-grained RL-based learning regime that links abstract reasoning with symbolic tools, offering a scalable path to more robust and transferable reasoning in LLMs.

Abstract

Recent studies have shown that large language models (LLMs), especially smaller ones, often lack robustness in grade school math (GSM) reasoning. In particular, they tend to experience performance drops when faced with distribution shifts, such as changes to numerical or nominal variables, or insertions of distracting clauses. A possible strategy to address this involves generating synthetic data to further "instantiate" reasoning problems on potential variations. In this work, we instead focuses on the strategy of "abstracting" reasoning problems. This not only helps counteract distribution shifts but also facilitates the connection to symbolic tools for deriving solutions. Focusing on GSM, we find that this abstraction process is better acquired through reinforcement learning (RL) than just supervised fine-tuning, which often fails to produce faithful abstractions. Our method, AbstRaL -- which promotes abstract reasoning in LLMs using RL on granular abstraction data -- significantly mitigates performance degradation on recent GSM perturbation benchmarks. Besides, improving GSM robustness via AbstRaL is shown to also implicitly benefit LLMs' capabilities on OOD mathematical and general reasoning tasks, indicating that abstract thinking broadly enables better generalizability.

AbstRaL: Augmenting LLMs' Reasoning by Reinforcing Abstract Thinking

TL;DR

AbstRaL addresses GSM reasoning robustness by learning abstract reasoning patterns through reinforcement learning on Granularly-decomposed Abstract Reasoning (GranulAR) data. The framework first abstracts problems into , and , retrieves the de-contextualized abstraction , and uses a symbolic solver to derive the final answer, guided by model-free rewards and within GRPO. Empirically, AbstRaL improves GSM reasoning robustness against instantiation and distractor perturbations across multiple seeds and models, and shows zero-shot improvements on a wide range of OOD mathematical and general reasoning tasks, suggesting that abstract thinking can broadly enhance generalizability. The work introduces GranulAR data and a fine-grained RL-based learning regime that links abstract reasoning with symbolic tools, offering a scalable path to more robust and transferable reasoning in LLMs.

Abstract

Recent studies have shown that large language models (LLMs), especially smaller ones, often lack robustness in grade school math (GSM) reasoning. In particular, they tend to experience performance drops when faced with distribution shifts, such as changes to numerical or nominal variables, or insertions of distracting clauses. A possible strategy to address this involves generating synthetic data to further "instantiate" reasoning problems on potential variations. In this work, we instead focuses on the strategy of "abstracting" reasoning problems. This not only helps counteract distribution shifts but also facilitates the connection to symbolic tools for deriving solutions. Focusing on GSM, we find that this abstraction process is better acquired through reinforcement learning (RL) than just supervised fine-tuning, which often fails to produce faithful abstractions. Our method, AbstRaL -- which promotes abstract reasoning in LLMs using RL on granular abstraction data -- significantly mitigates performance degradation on recent GSM perturbation benchmarks. Besides, improving GSM robustness via AbstRaL is shown to also implicitly benefit LLMs' capabilities on OOD mathematical and general reasoning tasks, indicating that abstract thinking broadly enables better generalizability.

Paper Structure

This paper contains 24 sections, 4 equations, 5 figures, 12 tables.

Figures (5)

  • Figure 1: Two paraphrased queries $X$ and $\tilde{X}$, having same solution $Y$, can be both handled by a common abstraction $A$.
  • Figure 2: Our AbstRaction Learning (AbstRaL) method effectively improves GSM reasoning robustness of LLMs, especially facing the variations of relevant input conditions and the interference of distracting conditions. We present average accuracy of all our tested LLMs on GSM-Plus li2024gsmplus, including the original GSM8K testing set (Original Reasoning Problem), the testing sets with numerical variations (Vary Relevant Conditions), averaged across three portions (digit expansion, integer-decimal-fraction conversion and numerical substitution), the testing set with problem rephrasing (Vary Problem Contexts) and with distractor insertion (Add Distracting Conditions).
  • Figure 3: Learning strategies to improve reasoning robustness with respect to distribution shifts. (a) Augmenting the amount of learning data by synthesizing more reasoning instances. (b) Directly learning to construct the underlying abstraction based on the input, including: (1) condition recognition, (2) abstract reasoning, (3) abstraction retrieval and (4) symbolic derivation.
  • Figure 4: Overview of GranulAR training data construction, which consists of an instance rewriting procedure to rewrite existing socratic CoT data $(\mathcal{X}, \mathcal{Y})$ into fine-grained abstract reasoning data $(\mathcal{X}^{\mathcal{A}}, \mathcal{C}, \mathcal{Y}^{\mathcal{A}}, \mathcal{A})$, followed by a answer verification procedure to check the correctness of rewriting.
  • Figure 5: Illustration of the abstraction rewards in our reinforcement learning approach, including the symbolic distance reward $r_{symbolic}$ and the answer correctness reward $r_{answer}$.