Table of Contents
Fetching ...

Reflecting with Two Voices: A Co-Adaptive Dual-Strategy Framework for LLM-Based Agent Decision Making

Wentao Zhang, Qunbo Wang, Tao Zhang, Junsheng Wu, Hongping Gan, Yang Liu, Ling Dai, Shizhuang Deng, Shuntong Sun

TL;DR

DuSAR addresses brittleness and high overhead in LLM-based agents by internalizing a co-adaptive dual-strategy framework within a frozen LLM. It pairs a Holistic Strategy for long-horizon planning with a Local Strategy for context-grounded execution, connected by a lightweight Strategy Integration Module and guided by a Strategy Fitness Score. Without demonstrations, DuSAR achieves state-of-the-art results on ALFWorld and Mind2Web with open-source LLMs and substantially reduces per-step token usage, while ablations confirm the necessity of dual-strategy coordination. The framework also demonstrates compatibility with optional expert demonstrations, highlighting its flexibility and practical potential for robust, efficient autonomous reasoning in dynamic environments.

Abstract

Large language model (LLM) agents often rely on external demonstrations or retrieval-augmented planning, leading to brittleness, poor generalization, and high computational overhead. Inspired by human problem-solving, we propose DuSAR (Dual-Strategy Agent with Reflecting) - a demonstration-free framework that enables a single frozen LLM to perform co-adaptive reasoning via two complementary strategies: a high-level holistic plan and a context-grounded local policy. These strategies interact through a lightweight reflection mechanism, where the agent continuously assesses progress via a Strategy Fitness Score and dynamically revises its global plan when stuck or refines it upon meaningful advancement, mimicking human metacognitive behavior. On ALFWorld and Mind2Web, DuSAR achieves state-of-the-art performance with open-source LLMs (7B-70B), reaching 37.1% success on ALFWorld (Llama3.1-70B) - more than doubling the best prior result (13.0%) - and 4.02% on Mind2Web, also more than doubling the strongest baseline. Remarkably, it reduces per-step token consumption by 3-9X while maintaining strong performance. Ablation studies confirm the necessity of dual-strategy coordination. Moreover, optional integration of expert demonstrations further boosts results, highlighting DuSAR's flexibility and compatibility with external knowledge.

Reflecting with Two Voices: A Co-Adaptive Dual-Strategy Framework for LLM-Based Agent Decision Making

TL;DR

DuSAR addresses brittleness and high overhead in LLM-based agents by internalizing a co-adaptive dual-strategy framework within a frozen LLM. It pairs a Holistic Strategy for long-horizon planning with a Local Strategy for context-grounded execution, connected by a lightweight Strategy Integration Module and guided by a Strategy Fitness Score. Without demonstrations, DuSAR achieves state-of-the-art results on ALFWorld and Mind2Web with open-source LLMs and substantially reduces per-step token usage, while ablations confirm the necessity of dual-strategy coordination. The framework also demonstrates compatibility with optional expert demonstrations, highlighting its flexibility and practical potential for robust, efficient autonomous reasoning in dynamic environments.

Abstract

Large language model (LLM) agents often rely on external demonstrations or retrieval-augmented planning, leading to brittleness, poor generalization, and high computational overhead. Inspired by human problem-solving, we propose DuSAR (Dual-Strategy Agent with Reflecting) - a demonstration-free framework that enables a single frozen LLM to perform co-adaptive reasoning via two complementary strategies: a high-level holistic plan and a context-grounded local policy. These strategies interact through a lightweight reflection mechanism, where the agent continuously assesses progress via a Strategy Fitness Score and dynamically revises its global plan when stuck or refines it upon meaningful advancement, mimicking human metacognitive behavior. On ALFWorld and Mind2Web, DuSAR achieves state-of-the-art performance with open-source LLMs (7B-70B), reaching 37.1% success on ALFWorld (Llama3.1-70B) - more than doubling the best prior result (13.0%) - and 4.02% on Mind2Web, also more than doubling the strongest baseline. Remarkably, it reduces per-step token consumption by 3-9X while maintaining strong performance. Ablation studies confirm the necessity of dual-strategy coordination. Moreover, optional integration of expert demonstrations further boosts results, highlighting DuSAR's flexibility and compatibility with external knowledge.

Paper Structure

This paper contains 43 sections, 5 equations, 4 figures, 5 tables, 1 algorithm.

Figures (4)

  • Figure 1: The overall architecture of DuSAR, illustrating its iterative dual-strategy reasoning in ALFWorld. Starting from a task instruction, the Holistic Strategy generates a high-level plan (①), which guides the initial action (②). After execution, environmental feedback and the current plan inform the Local Strategy to produce context-aware guidance (③) and evaluate progress via a Strategy Fitness Score (④). This evaluation triggers dynamic refinement of the Holistic Strategy (⑤). A Strategy Integration Module (SIM) synthesizes both strategies to select the next action (⑥), forming a co-adaptive loop that balances long-term coherence with immediate adaptability. See Fig. \ref{['fig:Framework']} for details on the reflecting mechanism.
  • Figure 2: DuSAR’s reflecting mechanism. Holistic Reflecting formulates a long-term strategy from the task instruction, past exploration traces, and prior local strategies. Local Reflecting generates context-aware actions and evaluates the feasibility of the current holistic plan based on environmental observations. Decision Reflecting synthesizes both strategies to select the next feasible action. All interactions are logged as structured Explore Traces for iterative refinement.
  • Figure 3: Module ablation: Holistic-only (DuSAR OH), Local-only (DuSAR OL), and naive concatenation (DuSAR NA) on ALFWorld (All SR) and Mind2Web Cross-Task (Step SR).
  • Figure 4: Ablation results of expert demonstration application on holistic (DuSAR HT), local (DuSAR LT), and bidirectional integration (DuSAR BT) on ALFWorld (All SR) and Mind2Web Cross-Task (Step SR).