Reflecting with Two Voices: A Co-Adaptive Dual-Strategy Framework for LLM-Based Agent Decision Making
Wentao Zhang, Qunbo Wang, Tao Zhang, Junsheng Wu, Hongping Gan, Yang Liu, Ling Dai, Shizhuang Deng, Shuntong Sun
TL;DR
DuSAR addresses brittleness and high overhead in LLM-based agents by internalizing a co-adaptive dual-strategy framework within a frozen LLM. It pairs a Holistic Strategy for long-horizon planning with a Local Strategy for context-grounded execution, connected by a lightweight Strategy Integration Module and guided by a Strategy Fitness Score. Without demonstrations, DuSAR achieves state-of-the-art results on ALFWorld and Mind2Web with open-source LLMs and substantially reduces per-step token usage, while ablations confirm the necessity of dual-strategy coordination. The framework also demonstrates compatibility with optional expert demonstrations, highlighting its flexibility and practical potential for robust, efficient autonomous reasoning in dynamic environments.
Abstract
Large language model (LLM) agents often rely on external demonstrations or retrieval-augmented planning, leading to brittleness, poor generalization, and high computational overhead. Inspired by human problem-solving, we propose DuSAR (Dual-Strategy Agent with Reflecting) - a demonstration-free framework that enables a single frozen LLM to perform co-adaptive reasoning via two complementary strategies: a high-level holistic plan and a context-grounded local policy. These strategies interact through a lightweight reflection mechanism, where the agent continuously assesses progress via a Strategy Fitness Score and dynamically revises its global plan when stuck or refines it upon meaningful advancement, mimicking human metacognitive behavior. On ALFWorld and Mind2Web, DuSAR achieves state-of-the-art performance with open-source LLMs (7B-70B), reaching 37.1% success on ALFWorld (Llama3.1-70B) - more than doubling the best prior result (13.0%) - and 4.02% on Mind2Web, also more than doubling the strongest baseline. Remarkably, it reduces per-step token consumption by 3-9X while maintaining strong performance. Ablation studies confirm the necessity of dual-strategy coordination. Moreover, optional integration of expert demonstrations further boosts results, highlighting DuSAR's flexibility and compatibility with external knowledge.
