REX: Rapid Exploration and eXploitation for AI Agents

Rithesh Murthy; Shelby Heinecke; Juan Carlos Niebles; Zhiwei Liu; Le Xue; Weiran Yao; Yihao Feng; Zeyuan Chen; Akash Gokul; Devansh Arpit; Ran Xu; Phil Mui; Huan Wang; Caiming Xiong; Silvio Savarese

REX: Rapid Exploration and eXploitation for AI Agents

Rithesh Murthy, Shelby Heinecke, Juan Carlos Niebles, Zhiwei Liu, Le Xue, Weiran Yao, Yihao Feng, Zeyuan Chen, Akash Gokul, Devansh Arpit, Ran Xu, Phil Mui, Huan Wang, Caiming Xiong, Silvio Savarese

TL;DR

REX tackles the challenge of enabling rapid, reward-guided decision-making for LLM-based AI agents without costly fine-tuning. It introduces a rapid exploration-exploitation framework that compresses Monte Carlo Tree Search by predicting complete solution sequences in a single pass and propagating rewards back to earlier steps. The authors propose three algorithms—REX-UCB, REX-$\mathcal{R}$, and REX-UCL—each with distinct reward and logits-management mechanisms, including UCL logits adjustments. Empirical results on Blocksworld and GSM8K show REX achieving competitive or better accuracy than CoT, Reflexion, and RAP while substantially reducing execution time, demonstrating practical viability for diverse reasoning tasks. Overall, REX provides a lightweight, prompt-driven pathway to integrate reinforcement-like signals into LLM control without fine-tuning.

Abstract

In this paper, we propose an enhanced approach for Rapid Exploration and eXploitation for AI Agents called REX. Existing AutoGPT-style techniques have inherent limitations, such as a heavy reliance on precise descriptions for decision-making, and the lack of a systematic approach to leverage try-and-fail procedures akin to traditional Reinforcement Learning (RL). REX introduces an additional layer of rewards and integrates concepts similar to Upper Confidence Bound (UCB) scores, leading to more robust and efficient AI agent performance. This approach has the advantage of enabling the utilization of offline behaviors from logs and allowing seamless integration with existing foundation models while it does not require any model fine-tuning. Through comparative analysis with existing methods such as Chain-of-Thoughts(CoT) and Reasoning viA Planning(RAP), REX-based methods demonstrate comparable performance and, in certain cases, even surpass the results achieved by these existing techniques. Notably, REX-based methods exhibit remarkable reductions in execution time, enhancing their practical applicability across a diverse set of scenarios.

REX: Rapid Exploration and eXploitation for AI Agents

TL;DR

, and REX-UCL—each with distinct reward and logits-management mechanisms, including UCL logits adjustments. Empirical results on Blocksworld and GSM8K show REX achieving competitive or better accuracy than CoT, Reflexion, and RAP while substantially reducing execution time, demonstrating practical viability for diverse reasoning tasks. Overall, REX provides a lightweight, prompt-driven pathway to integrate reinforcement-like signals into LLM control without fine-tuning.

Abstract

Paper Structure (24 sections, 8 figures, 2 tables, 2 algorithms)

This paper contains 24 sections, 8 figures, 2 tables, 2 algorithms.

Introduction
Related Work
Monte Carlo Tree Search
Proposed Methodology
Rapid Exploration and Exploitation: REX
Algorithm 1: REX-UCB
Algorithm 2: REX-$\mathcal{R}$
Algorithm 3: REX-UCL
Experiments & Discussion
Baseline
Blocksworld
GSM8K
REX: Accuracy, Speed, and Limitations
REX: UCB for effective Exploration and Exploitation
Conclusion
...and 9 more sections

Figures (8)

Figure 1: The four major steps of MCTS is depicted in the above figure. These steps are executed sequentially 'N' times.
Figure 2: REX
Figure 3: REX Flowchart
Figure 4: Prmpt for REX-UCB
Figure 5: REX-UCL
...and 3 more figures

REX: Rapid Exploration and eXploitation for AI Agents

TL;DR

Abstract

REX: Rapid Exploration and eXploitation for AI Agents

Authors

TL;DR

Abstract

Table of Contents

Figures (8)