Table of Contents
Fetching ...

DynaAct: Large Language Model Reasoning with Dynamic Action Spaces

Xueliang Zhao, Wei Wu, Jian Guan, Qintong Li, Lingpeng Kong

TL;DR

DynaAct tackles the challenge of designing scalable yet compact action spaces for LLM-driven sequential reasoning by automatically constructing a proxy action space from a broad problem corpus, then selecting a small, diverse subset per reasoning step via a learned submodular objective. The method combines a utility term tied to expected rewards with a diversity term, optimized greedily, and uses MCTS to estimate action values while keeping the base LLM frozen. An embedding-based Q-learning objective guides the utility embedding, enabling compact, informative action sets that improve performance across six benchmarks, notably in math and reasoning tasks, with only modest latency increases. The approach demonstrates strong generalization across domains and offers a practical, scalable path for enhancing multi-step reasoning with dynamically constructed action spaces.

Abstract

In modern sequential decision-making systems, the construction of an optimal candidate action space is critical to efficient inference. However, existing approaches either rely on manually defined action spaces that lack scalability or utilize unstructured spaces that render exhaustive search computationally prohibitive. In this paper, we propose a novel framework named \textsc{DynaAct} for automatically constructing a compact action space to enhance sequential reasoning in complex problem-solving scenarios. Our method first estimates a proxy for the complete action space by extracting general sketches observed in a corpus covering diverse complex reasoning problems using large language models. We then formulate a submodular function that jointly evaluates candidate actions based on their utility to the current state and their diversity, and employ a greedy algorithm to select an optimal candidate set. Extensive experiments on six diverse standard benchmarks demonstrate that our approach significantly improves overall performance, while maintaining efficient inference without introducing substantial latency. The implementation is available at https://github.com/zhaoxlpku/DynaAct.

DynaAct: Large Language Model Reasoning with Dynamic Action Spaces

TL;DR

DynaAct tackles the challenge of designing scalable yet compact action spaces for LLM-driven sequential reasoning by automatically constructing a proxy action space from a broad problem corpus, then selecting a small, diverse subset per reasoning step via a learned submodular objective. The method combines a utility term tied to expected rewards with a diversity term, optimized greedily, and uses MCTS to estimate action values while keeping the base LLM frozen. An embedding-based Q-learning objective guides the utility embedding, enabling compact, informative action sets that improve performance across six benchmarks, notably in math and reasoning tasks, with only modest latency increases. The approach demonstrates strong generalization across domains and offers a practical, scalable path for enhancing multi-step reasoning with dynamically constructed action spaces.

Abstract

In modern sequential decision-making systems, the construction of an optimal candidate action space is critical to efficient inference. However, existing approaches either rely on manually defined action spaces that lack scalability or utilize unstructured spaces that render exhaustive search computationally prohibitive. In this paper, we propose a novel framework named \textsc{DynaAct} for automatically constructing a compact action space to enhance sequential reasoning in complex problem-solving scenarios. Our method first estimates a proxy for the complete action space by extracting general sketches observed in a corpus covering diverse complex reasoning problems using large language models. We then formulate a submodular function that jointly evaluates candidate actions based on their utility to the current state and their diversity, and employ a greedy algorithm to select an optimal candidate set. Extensive experiments on six diverse standard benchmarks demonstrate that our approach significantly improves overall performance, while maintaining efficient inference without introducing substantial latency. The implementation is available at https://github.com/zhaoxlpku/DynaAct.

Paper Structure

This paper contains 39 sections, 1 theorem, 41 equations, 5 figures, 12 tables.

Key Result

Lemma 1

Given the definitions of the relevance term $f_{\mathrm{util}}(\mathcal{A}_t; s_t)$ in Eq. eq:submodular_util and the diversity term $f_{\mathrm{div}}(\mathcal{A}_t)$ in Eq. eq:submodular_div, the function $F(\mathcal{A}_t; s_t)$ defined in Eq. eq:submodular is submodular with respect to the candida

Figures (5)

  • Figure 1: Overview of the proposed method. Given the proxy action space $\mathcal{A}$, the method searches for the subset $\mathcal{A}_t$ that maximizes the submodular function, which consists of a utility term and a diversity term. The subset $\mathcal{A}_t$ is then used for the subsequent reasoning steps.
  • Figure 2: A comparison of RAP and DynaAct with respect to different action space sizes (i.e., $m$). The $x$-axis indicates the number of rollouts, while the $y$-axis shows the accuracy on MATH-500.
  • Figure 3: Case study: solution to alternating series sum.
  • Figure 4: Case study: solution to Greek army battalion formation problem.
  • Figure 5: Case study: solution to inscribed hexagon angle problem.

Theorems & Definitions (2)

  • Lemma 1
  • proof