Table of Contents
Fetching ...

PoAct: Policy and Action Dual-Control Agent for Generalized Applications

Guozhi Yuan, Youfeng Liu, Jingli Yang, Wei Jia, Kai Lin, Yansong Gao, Shan He, Zilin Ding, Haitao Li

TL;DR

PoAct addresses misalignment between planning and action in ReAct-like agents by introducing Policy Controller and Action Controller to dynamically switch reasoning policies and adjust the action space. It extends the ReAct Code Agent with a dual-control framework and a RAG-based tool manager plus Path Reviewer to maintain robust reasoning paths. Evaluations on LegalAgentBench and AgentBench across GPT-4o, glm-4, and Qwen-2.5 show improved success rates and lower token usage, especially on multi-hop and complex tasks. The results demonstrate strong generalizability and scalability for generalized applications requiring high-quality code actions.

Abstract

Based on their superior comprehension and reasoning capabilities, Large Language Model (LLM) driven agent frameworks have achieved significant success in numerous complex reasoning tasks. ReAct-like agents can solve various intricate problems step-by-step through progressive planning and tool calls, iteratively optimizing new steps based on environmental feedback. However, as the planning capabilities of LLMs improve, the actions invoked by tool calls in ReAct-like frameworks often misalign with complex planning and challenging data organization. Code Action addresses these issues while also introducing the challenges of a more complex action space and more difficult action organization. To leverage Code Action and tackle the challenges of its complexity, this paper proposes Policy and Action Dual-Control Agent (PoAct) for generalized applications. The aim is to achieve higher-quality code actions and more accurate reasoning paths by dynamically switching reasoning policies and modifying the action space. Experimental results on the Agent Benchmark for both legal and generic scenarios demonstrate the superior reasoning capabilities and reduced token consumption of our approach in complex tasks. On the LegalAgentBench, our method shows a 20 percent improvement over the baseline while requiring fewer tokens. We conducted experiments and analyses on the GPT-4o and GLM-4 series models, demonstrating the significant potential and scalability of our approach to solve complex problems.

PoAct: Policy and Action Dual-Control Agent for Generalized Applications

TL;DR

PoAct addresses misalignment between planning and action in ReAct-like agents by introducing Policy Controller and Action Controller to dynamically switch reasoning policies and adjust the action space. It extends the ReAct Code Agent with a dual-control framework and a RAG-based tool manager plus Path Reviewer to maintain robust reasoning paths. Evaluations on LegalAgentBench and AgentBench across GPT-4o, glm-4, and Qwen-2.5 show improved success rates and lower token usage, especially on multi-hop and complex tasks. The results demonstrate strong generalizability and scalability for generalized applications requiring high-quality code actions.

Abstract

Based on their superior comprehension and reasoning capabilities, Large Language Model (LLM) driven agent frameworks have achieved significant success in numerous complex reasoning tasks. ReAct-like agents can solve various intricate problems step-by-step through progressive planning and tool calls, iteratively optimizing new steps based on environmental feedback. However, as the planning capabilities of LLMs improve, the actions invoked by tool calls in ReAct-like frameworks often misalign with complex planning and challenging data organization. Code Action addresses these issues while also introducing the challenges of a more complex action space and more difficult action organization. To leverage Code Action and tackle the challenges of its complexity, this paper proposes Policy and Action Dual-Control Agent (PoAct) for generalized applications. The aim is to achieve higher-quality code actions and more accurate reasoning paths by dynamically switching reasoning policies and modifying the action space. Experimental results on the Agent Benchmark for both legal and generic scenarios demonstrate the superior reasoning capabilities and reduced token consumption of our approach in complex tasks. On the LegalAgentBench, our method shows a 20 percent improvement over the baseline while requiring fewer tokens. We conducted experiments and analyses on the GPT-4o and GLM-4 series models, demonstrating the significant potential and scalability of our approach to solve complex problems.
Paper Structure (25 sections, 9 figures, 4 tables)

This paper contains 25 sections, 9 figures, 4 tables.

Figures (9)

  • Figure 1: The figure illustrates the mismatch between the single-step action space and the planning capabilities of LLMs. Code actions help mitigate this problem to some extent.
  • Figure 2: The framework diagram of PoAct illustrates how PoAct adjusts its reasoning policy based on the reasoning step via the Policy Controller. It employs the Action Controller to dynamically switch the visible tools and few-shot examples, while also evaluating anomalous reasoning paths.
  • Figure 3: This diagram illustrates how PoAct addresses user queries using the ReAct Code paradigm.
  • Figure 4: Path Reviewer avoids PoAct from entering anomalous reasoning paths through three tasks.
  • Figure 5:
  • ...and 4 more figures