Table of Contents
Fetching ...

Searching Meta Reasoning Skeleton to Guide LLM Reasoning

Ziying Zhang, Yaqing Wang, Quanming Yao

TL;DR

This work introduces AutoMR, a framework for automatically discovering query-aware meta-reasoning skeletons for LLMs by representing skeletons as single-source, edge-heterogeneous DAGs and searching over a DAG-based space under a token budget. A dynamic skeleton sampling algorithm expands the skeleton node-by-node during inference, conditioning edge choices on the evolving base reasoning context to achieve efficient adaptation. The method uses REINFORCE to train the skeleton policy and demonstrates superior reasoning accuracy and scaling efficiency across math Q&A and general multiple-choice tasks compared with manually designed skeletons and AutoML baselines. Results show that AutoMR reliably outperforms existing meta-reasoning approaches while maintaining practical training and inference costs, highlighting its potential for query-specific, context-adaptive reasoning guidance.

Abstract

Meta reasoning behaviors work as a skeleton to guide large language model (LLM) reasoning, thus help to improve reasoning performance. However, prior researches implement meta reasoning skeleton with manually designed structure, limiting ability to adapt to query-specific requirement and capture intricate logical dependency among reasoning steps. To deal with the challenges, we represent meta reasoning skeleton with directed acyclic graph (DAG) to unify skeletons proposed in prior works and model intricate logical dependency. Then we propose AutoMR, a framework that searches for query-aware meta reasoning skeleton automatically inspired by automated machine learning (AutoML). Specifically, we construct search space based on DAG representation of skeleton and then formulate the search problem. We design a dynamic skeleton sampling algorithm by expanding meta reasoning skeleton along with reasoning context at inference time. This algorithm can derive any meta reasoning skeleton in search space efficiently and adapt skeleton to evolving base reasoning context, thus enable efficient query-aware skeleton search. We conduct experiments on extensive benchmark datasets. Experimental results show that AutoMR achieves better reasoning performance than previous works broadly.

Searching Meta Reasoning Skeleton to Guide LLM Reasoning

TL;DR

This work introduces AutoMR, a framework for automatically discovering query-aware meta-reasoning skeletons for LLMs by representing skeletons as single-source, edge-heterogeneous DAGs and searching over a DAG-based space under a token budget. A dynamic skeleton sampling algorithm expands the skeleton node-by-node during inference, conditioning edge choices on the evolving base reasoning context to achieve efficient adaptation. The method uses REINFORCE to train the skeleton policy and demonstrates superior reasoning accuracy and scaling efficiency across math Q&A and general multiple-choice tasks compared with manually designed skeletons and AutoML baselines. Results show that AutoMR reliably outperforms existing meta-reasoning approaches while maintaining practical training and inference costs, highlighting its potential for query-specific, context-adaptive reasoning guidance.

Abstract

Meta reasoning behaviors work as a skeleton to guide large language model (LLM) reasoning, thus help to improve reasoning performance. However, prior researches implement meta reasoning skeleton with manually designed structure, limiting ability to adapt to query-specific requirement and capture intricate logical dependency among reasoning steps. To deal with the challenges, we represent meta reasoning skeleton with directed acyclic graph (DAG) to unify skeletons proposed in prior works and model intricate logical dependency. Then we propose AutoMR, a framework that searches for query-aware meta reasoning skeleton automatically inspired by automated machine learning (AutoML). Specifically, we construct search space based on DAG representation of skeleton and then formulate the search problem. We design a dynamic skeleton sampling algorithm by expanding meta reasoning skeleton along with reasoning context at inference time. This algorithm can derive any meta reasoning skeleton in search space efficiently and adapt skeleton to evolving base reasoning context, thus enable efficient query-aware skeleton search. We conduct experiments on extensive benchmark datasets. Experimental results show that AutoMR achieves better reasoning performance than previous works broadly.

Paper Structure

This paper contains 25 sections, 2 theorems, 9 equations, 6 figures, 5 tables, 2 algorithms.

Key Result

Proposition 1

Sequential, parallel, and tree structured skeletons can all be represented as single-source, edge-heterogeneous DAGs.

Figures (6)

  • Figure 1: Human behaviors in meta reasoning for three questions about math (Q1 and Q2) and biology multi-choice (Q3).
  • Figure 2: Overview of the AutoMR. Top: Illustration of search space, an example skeleton sampling process and resulting sampled skeleton. Node 0 is the single source node representing query. Steps (1)(2)(3) show how nodes 1, 2, and 3 are successively added to partial skeleton. For clarity, we display only 4 nodes and 2 types of meta reasoning strategies (red and blue edges), and the zero option (gray edges); In practice, the number of nodes can be arbitrary if token budget is satisfied and we actually implement richer strategies. Bottom: Search space subsumes sequential, parallel, and tree-structured skeletons.
  • Figure 3: The scaling curve of AutoMR and baselines.
  • Figure 4: The training and inference cost and performance of AutoMR and baselines.
  • Figure 5: Searched skeletons for queries from MATH-500 Level1, Level5 and Science respectively.
  • ...and 1 more figures

Theorems & Definitions (5)

  • Proposition 1
  • Definition 1: Meta-Reasoning Skeleton Search Problem
  • Proposition 2
  • proof
  • proof