Table of Contents
Fetching ...

Cooperative Strategic Planning Enhances Reasoning Capabilities in Large Language Models

Danqing Wang, Zhuorui Ye, Fei Fang, Lei Li

TL;DR

This paper proposes a novel cooperative multi-agent reasoning framework (CoPlanner) by separating reasoning steps and assigning distinct duties to different agents and demonstrates that the guidance from the planning agent and the effective cooperation between the agents contribute to the superior performance of CoPlanner in tackling multi-step reasoning problems.

Abstract

Enhancing the reasoning capabilities of large language models (LLMs) is crucial for enabling them to tackle complex, multi-step problems. Multi-agent frameworks have shown great potential in enhancing LLMs' reasoning capabilities. However, the lack of effective cooperation between LLM agents hinders their performance, especially for multi-step reasoning tasks. This paper proposes a novel cooperative multi-agent reasoning framework (CoPlanner) by separating reasoning steps and assigning distinct duties to different agents. CoPlanner consists of two LLM agents: a planning agent and a reasoning agent. The planning agent provides high-level strategic hints, while the reasoning agent follows these hints and infers answers. By training the planning agent's policy through the interactive reasoning process via Proximal Policy Optimization (PPO), the LLaMA-3-8B-based CoPlanner outperforms the previous best method by 9.94\% on LogiQA and 3.09\% on BBH. Our results demonstrate that the guidance from the planning agent and the effective cooperation between the agents contribute to the superior performance of CoPlanner in tackling multi-step reasoning problems.

Cooperative Strategic Planning Enhances Reasoning Capabilities in Large Language Models

TL;DR

This paper proposes a novel cooperative multi-agent reasoning framework (CoPlanner) by separating reasoning steps and assigning distinct duties to different agents and demonstrates that the guidance from the planning agent and the effective cooperation between the agents contribute to the superior performance of CoPlanner in tackling multi-step reasoning problems.

Abstract

Enhancing the reasoning capabilities of large language models (LLMs) is crucial for enabling them to tackle complex, multi-step problems. Multi-agent frameworks have shown great potential in enhancing LLMs' reasoning capabilities. However, the lack of effective cooperation between LLM agents hinders their performance, especially for multi-step reasoning tasks. This paper proposes a novel cooperative multi-agent reasoning framework (CoPlanner) by separating reasoning steps and assigning distinct duties to different agents. CoPlanner consists of two LLM agents: a planning agent and a reasoning agent. The planning agent provides high-level strategic hints, while the reasoning agent follows these hints and infers answers. By training the planning agent's policy through the interactive reasoning process via Proximal Policy Optimization (PPO), the LLaMA-3-8B-based CoPlanner outperforms the previous best method by 9.94\% on LogiQA and 3.09\% on BBH. Our results demonstrate that the guidance from the planning agent and the effective cooperation between the agents contribute to the superior performance of CoPlanner in tackling multi-step reasoning problems.

Paper Structure

This paper contains 18 sections, 1 equation, 6 figures, 6 tables.

Figures (6)

  • Figure 1: CoPlanner consists of two key agents: the reasoning agent and the planning agent. The reasoning agent is responsible for conducting the reasoning process, while the planning agent provides strategic guidance. For each round, the planning agent selects the most appropriate meta-strategy from a pool of meta-strategies based on the historical reasoning process of the reasoning agent. It then generates a detailed hint based on the chosen meta-strategy. This hint is passed to the reasoning agent to guide the next step of reasoning. This interactive process between the two agents continues until the planning agent selects the "finish" strategy, indicating that the reasoning process is complete, or until a maximum number of rounds is reached.
  • Figure 2: The detailed diagram of the planning agent. It takes the query and historical thoughts of the reasoning agent as the observation and obtains several candidate strategies based on the meta-strategy pool. The observation and candidate strategies use the last hidden state of a frozen LLM as their representation. We use PPO to train a critic network to approximate the reward and a policy network to select the best meta-strategy.
  • Figure 3: Performance with different interactive rounds between agents. 2 rounds achieve the best performance.
  • Figure 4: Performance after training with different training steps. The unit of x-axis is thousand.
  • Figure 5: CoPlanner's episode accuracy on LogiQA. The blue curve is the raw data while the orange is the smoothed plot.
  • ...and 1 more figures