Table of Contents
Fetching ...

Ares: Adaptive Reasoning Effort Selection for Efficient LLM Agents

Jingbo Yang, Bairu Hou, Wei Wei, Yujia Bao, Shiyu Chang

TL;DR

Ares, a framework for per-step dynamic reasoning effort selection tailored for multi-step agent tasks, employs a lightweight router to predict the lowest appropriate reasoning level for each step based on the interaction history, enabling plug-and-play integration for any LLM agents.

Abstract

Modern agents powered by thinking LLMs achieve high accuracy through long chain-of-thought reasoning but incur substantial inference costs. While many LLMs now support configurable reasoning levels (e.g., high/medium/low), static strategies are often ineffective: using low-effort modes at every step leads to significant performance degradation, while random selection fails to preserve accuracy or provide meaningful cost reduction. However, agents should reserve high reasoning effort for difficult steps like navigating complex website structures, while using lower-effort modes for simpler steps like opening a target URL. In this paper, we propose Ares, a framework for per-step dynamic reasoning effort selection tailored for multi-step agent tasks. Ares employs a lightweight router to predict the lowest appropriate reasoning level for each step based on the interaction history. To train this router, we develop a data generation pipeline that identifies the minimum reasoning effort required for successful step completion. We then fine-tune the router to predict these levels, enabling plug-and-play integration for any LLM agents. We evaluate Ares on a diverse set of agent tasks, including TAU-Bench for tool use agents, BrowseComp-Plus for deep-research agents, and WebArena for web agents. Experimental results show that Ares reduces reasoning token usage by up to 52.7% compared to fixed high-effort reasoning, while introducing minimal degradation in task success rates.

Ares: Adaptive Reasoning Effort Selection for Efficient LLM Agents

TL;DR

Ares, a framework for per-step dynamic reasoning effort selection tailored for multi-step agent tasks, employs a lightweight router to predict the lowest appropriate reasoning level for each step based on the interaction history, enabling plug-and-play integration for any LLM agents.

Abstract

Modern agents powered by thinking LLMs achieve high accuracy through long chain-of-thought reasoning but incur substantial inference costs. While many LLMs now support configurable reasoning levels (e.g., high/medium/low), static strategies are often ineffective: using low-effort modes at every step leads to significant performance degradation, while random selection fails to preserve accuracy or provide meaningful cost reduction. However, agents should reserve high reasoning effort for difficult steps like navigating complex website structures, while using lower-effort modes for simpler steps like opening a target URL. In this paper, we propose Ares, a framework for per-step dynamic reasoning effort selection tailored for multi-step agent tasks. Ares employs a lightweight router to predict the lowest appropriate reasoning level for each step based on the interaction history. To train this router, we develop a data generation pipeline that identifies the minimum reasoning effort required for successful step completion. We then fine-tune the router to predict these levels, enabling plug-and-play integration for any LLM agents. We evaluate Ares on a diverse set of agent tasks, including TAU-Bench for tool use agents, BrowseComp-Plus for deep-research agents, and WebArena for web agents. Experimental results show that Ares reduces reasoning token usage by up to 52.7% compared to fixed high-effort reasoning, while introducing minimal degradation in task success rates.
Paper Structure (37 sections, 6 equations, 5 figures, 6 tables)

This paper contains 37 sections, 6 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Overview of the Adaptive Reasoning Effort Selection (ARES) Framework.Left: Traditional Model Routing which often incurs extra inference costs without KV cache reuse. Middle: Our proposed Ares framework, dynamically allocates reasoning effort at each step. Right:Ares (red star) achieves the optimal balance between performance and cost compared to baselines.
  • Figure 2: Overview of the Ares training pipeline. (1) Trajectory Collection: Optimal ground-truth paths are defined by filtering successful trajectories with minimal steps. (2) Effort Annotation: The minimum sufficient reasoning effort for each step is identified via sampling and LLM verification. (3) Rationale Generation: A teacher LLM generates semantic justifications based on task observations and complexity. (4) Supervised Fine-tuning: The Ares router is fine-tuned to jointly predict rationales and effort labels. (5) Reinforcement Learning: The fine-tuned Ares router will be further trained using GRPO with outcome, reasoning cost, and format reward.
  • Figure 3: Selection of reasoning effort by Ares on the WebArena benchmark. Left: Percentage distribution of low, medium, and high effort levels across task step indices. Right: Distribution of effort levels categorized by specific action types.
  • Figure 4: Evolution of Ares reasoning effort prediction during GRPO training (TAU-Bench Airline).
  • Figure 5: Comparison of medium-effort (left) and high-effort (right) ratios between whether to use normalized reasoning cost reward during training.