Think Fast and Slow: Step-Level Cognitive Depth Adaptation for LLM Agents
Ruihan Yang, Fanghua Ye, Xiang We, Ruoqing Zhao, Kang Luo, Xinbo Xu, Bo Zhao, Ruotian Ma, Shanyi Wang, Zhaopeng Tu, Xiaolong Li, Deqing Yang, Linus
TL;DR
CogRouter introduces step-level cognitive depth adaptation for LLM agents grounded in ACT-R, defining four cognitive levels and a two-stage training pipeline (CoSFT and CoPO) to learn stable level patterns and perform step-wise credit assignment via confidence-aware reweighting. Through experiments on ALFWorld and ScienceWorld, it achieves state-of-the-art task success with substantially lower token usage compared to fixed-pattern and trajectory-level RL baselines. The approach addresses cognitive rigidity in long-horizon agent tasks and demonstrates dynamic depth allocation that scales with task complexity. Overall, CogRouter offers a principled framework for efficient, adaptive reasoning in embodied and applied LLM agent settings.
Abstract
Large language models (LLMs) are increasingly deployed as autonomous agents for multi-turn decision-making tasks. However, current agents typically rely on fixed cognitive patterns: non-thinking models generate immediate responses, while thinking models engage in deep reasoning uniformly. This rigidity is inefficient for long-horizon tasks, where cognitive demands vary significantly from step to step, with some requiring strategic planning and others only routine execution. In this paper, we introduce CogRouter, a framework that trains agents to dynamically adapt cognitive depth at each step. Grounded in ACT-R theory, we design four hierarchical cognitive levels ranging from instinctive responses to strategic planning. Our two-stage training approach includes Cognition-aware Supervised Fine-tuning (CoSFT) to instill stable level-specific patterns, and Cognition-aware Policy Optimization (CoPO) for step-level credit assignment via confidence-aware advantage reweighting. The key insight is that appropriate cognitive depth should maximize the confidence of the resulting action. Experiments on ALFWorld and ScienceWorld demonstrate that CogRouter achieves state-of-the-art performance with superior efficiency. With Qwen2.5-7B, it reaches an 82.3% success rate, outperforming GPT-4o (+40.3%), OpenAI-o3 (+18.3%), and GRPO (+14.0%), while using 62% fewer tokens.
