Learning to Configure Agentic AI Systems
Aditya Taparia, Som Sagar, Ransalu Senanayake
TL;DR
This work introduces ARC, a hierarchical reinforcement learning framework that dynamically configures LLM-based agent systems on a per-query basis by jointly selecting workflows, tools, budgets, and prompts. The architecture splits decision-making into a structure policy and a prompt policy, trained with PPO on shaped rewards and augmented by an SFT post-training refinement that guarantees performance concentration on elite configurations. Across multiple reasoning and tool-use benchmarks, ARC outperforms static templates, grid/greedy search, and flat RL baselines while reducing token usage and runtime, demonstrating significant gains in accuracy and efficiency. The approach offers a scalable, adaptable alternative to one-size-fits-all designs, with transfer behavior that favors cross-task structural generalization and positive scaling with model capacity.
Abstract
Configuring LLM-based agent systems involves choosing workflows, tools, token budgets, and prompts from a large combinatorial design space, and is typically handled today by fixed large templates or hand-tuned heuristics. This leads to brittle behavior and unnecessary compute, since the same cumbersome configuration is often applied to both easy and hard input queries. We formulate agent configuration as a query-wise decision problem and introduce ARC (Agentic Resource & Configuration learner), which learns a light-weight hierarchical policy using reinforcement learning to dynamically tailor these configurations. Across multiple benchmarks spanning reasoning and tool-augmented question answering, the learned policy consistently outperforms strong hand-designed and other baselines, achieving up to 25% higher task accuracy while also reducing token and runtime costs. These results demonstrate that learning per-query agent configurations is a powerful alternative to "one size fits all" designs.
