REX: Rapid Exploration and eXploitation for AI Agents
Rithesh Murthy, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Le Xue, Weiran Yao, Yihao Feng, Zeyuan Chen, Akash Gokul, Devansh Arpit, Ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese
TL;DR
REX tackles the challenge of enabling rapid, reward-guided decision-making for LLM-based AI agents without costly fine-tuning. It introduces a rapid exploration-exploitation framework that compresses Monte Carlo Tree Search by predicting complete solution sequences in a single pass and propagating rewards back to earlier steps. The authors propose three algorithms—REX-UCB, REX-$\mathcal{R}$, and REX-UCL—each with distinct reward and logits-management mechanisms, including UCL logits adjustments. Empirical results on Blocksworld and GSM8K show REX achieving competitive or better accuracy than CoT, Reflexion, and RAP while substantially reducing execution time, demonstrating practical viability for diverse reasoning tasks. Overall, REX provides a lightweight, prompt-driven pathway to integrate reinforcement-like signals into LLM control without fine-tuning.
Abstract
In this paper, we propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX. Existing AutoGPT-style techniques have inherent limitations, such as a heavy reliance on precise descriptions for decision-making, and the lack of a systematic approach to leverage try-and-fail procedures akin to traditional Reinforcement Learning (RL). REX introduces an additional layer of rewards and integrates concepts similar to Upper Confidence Bound (UCB) scores, leading to more robust and efficient AI agent performance. This approach has the advantage of enabling the utilization of offline behaviors from logs and allowing seamless integration with existing foundation models while it does not require any model fine-tuning. Through comparative analysis with existing methods such as Chain-of-Thoughts(CoT) and Reasoning viA Planning(RAP), REX-based methods demonstrate comparable performance and, in certain cases, even surpass the results achieved by these existing techniques. Notably, REX-based methods exhibit remarkable reductions in execution time, enhancing their practical applicability across a diverse set of scenarios.
