CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing

Chen Yang; Chenyang Zhao; Quanquan Gu; Dongruo Zhou

CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing

Chen Yang, Chenyang Zhao, Quanquan Gu, Dongruo Zhou

TL;DR

CoPS introduces Cross-Task Experience Sharing, a general, pessimism-based algorithm to reuse and select cross-task experiences for LLM agents. It unifies offline and online settings through a memory bank and a decoder-driven distribution-matching objective, enabling efficient and robust sequential reasoning. Empirically, CoPS achieves state-of-the-art performance on Alfworld, Webshop, and HotPotQA across model scales, with notable sample efficiency advantages. Theoretical analysis ties performance to LLM pretraining quality and task-distribution alignment, offering guidance for design and deployment in diverse tasks.

Abstract

Sequential reasoning in agent systems has been significantly advanced by large language models (LLMs), yet existing approaches face limitations. Reflection-driven reasoning relies solely on knowledge in pretrained models, limiting performance in novel scenarios, while experience-assisted reasoning often depends on external experiences and lacks clear principles for selecting representative experiences. We address these limitations by proposing CoPS (Cross-Task Experience Sharing), a generalizable algorithm that enhances sequential reasoning by cross-task experience sharing and selection. In detail, CoPS leverages agents' experiences on previous tasks, selecting distribution-matched experiences via a provable pessimism-based strategy to maximize utility while minimizing risks from distribution shifts. Extensive experimental results on benchmarks like Alfworld, Webshop, and HotPotQA demonstrate that CoPS consistently outperforms state-of-the-art baselines, with superior sample efficiency suitable for resource-constrained scenarios. Theoretically, we show that the performance of our algorithm depends on both the quality of the pretrained LLM and the matching between the agent's task-dependent trial distribution and that generated by the LLM. Our work bridges the gap between existing sequential reasoning paradigms and validates the effectiveness of leveraging cross-task experiences, shedding light on the potential to improve agents' generalization and adaptability across diverse tasks. Our codes are available at $\href{https://github.com/uclaml/COPS}{\text{https://github.com/uclaml/COPS}}$.

CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing

TL;DR

Abstract

CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (3)

Theorems & Definitions (5)