Searching Meta Reasoning Skeleton to Guide LLM Reasoning
Ziying Zhang, Yaqing Wang, Quanming Yao
TL;DR
This work introduces AutoMR, a framework for automatically discovering query-aware meta-reasoning skeletons for LLMs by representing skeletons as single-source, edge-heterogeneous DAGs and searching over a DAG-based space under a token budget. A dynamic skeleton sampling algorithm expands the skeleton node-by-node during inference, conditioning edge choices on the evolving base reasoning context to achieve efficient adaptation. The method uses REINFORCE to train the skeleton policy and demonstrates superior reasoning accuracy and scaling efficiency across math Q&A and general multiple-choice tasks compared with manually designed skeletons and AutoML baselines. Results show that AutoMR reliably outperforms existing meta-reasoning approaches while maintaining practical training and inference costs, highlighting its potential for query-specific, context-adaptive reasoning guidance.
Abstract
Meta reasoning behaviors work as a skeleton to guide large language model (LLM) reasoning, thus help to improve reasoning performance. However, prior researches implement meta reasoning skeleton with manually designed structure, limiting ability to adapt to query-specific requirement and capture intricate logical dependency among reasoning steps. To deal with the challenges, we represent meta reasoning skeleton with directed acyclic graph (DAG) to unify skeletons proposed in prior works and model intricate logical dependency. Then we propose AutoMR, a framework that searches for query-aware meta reasoning skeleton automatically inspired by automated machine learning (AutoML). Specifically, we construct search space based on DAG representation of skeleton and then formulate the search problem. We design a dynamic skeleton sampling algorithm by expanding meta reasoning skeleton along with reasoning context at inference time. This algorithm can derive any meta reasoning skeleton in search space efficiently and adapt skeleton to evolving base reasoning context, thus enable efficient query-aware skeleton search. We conduct experiments on extensive benchmark datasets. Experimental results show that AutoMR achieves better reasoning performance than previous works broadly.
