Table of Contents
Fetching ...

CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing

Chen Yang, Chenyang Zhao, Quanquan Gu, Dongruo Zhou

TL;DR

CoPS introduces Cross-Task Experience Sharing, a general, pessimism-based algorithm to reuse and select cross-task experiences for LLM agents. It unifies offline and online settings through a memory bank and a decoder-driven distribution-matching objective, enabling efficient and robust sequential reasoning. Empirically, CoPS achieves state-of-the-art performance on Alfworld, Webshop, and HotPotQA across model scales, with notable sample efficiency advantages. Theoretical analysis ties performance to LLM pretraining quality and task-distribution alignment, offering guidance for design and deployment in diverse tasks.

Abstract

Sequential reasoning in agent systems has been significantly advanced by large language models (LLMs), yet existing approaches face limitations. Reflection-driven reasoning relies solely on knowledge in pretrained models, limiting performance in novel scenarios, while experience-assisted reasoning often depends on external experiences and lacks clear principles for selecting representative experiences. We address these limitations by proposing CoPS (Cross-Task Experience Sharing), a generalizable algorithm that enhances sequential reasoning by cross-task experience sharing and selection. In detail, CoPS leverages agents' experiences on previous tasks, selecting distribution-matched experiences via a provable pessimism-based strategy to maximize utility while minimizing risks from distribution shifts. Extensive experimental results on benchmarks like Alfworld, Webshop, and HotPotQA demonstrate that CoPS consistently outperforms state-of-the-art baselines, with superior sample efficiency suitable for resource-constrained scenarios. Theoretically, we show that the performance of our algorithm depends on both the quality of the pretrained LLM and the matching between the agent's task-dependent trial distribution and that generated by the LLM. Our work bridges the gap between existing sequential reasoning paradigms and validates the effectiveness of leveraging cross-task experiences, shedding light on the potential to improve agents' generalization and adaptability across diverse tasks. Our codes are available at $\href{https://github.com/uclaml/COPS}{\text{https://github.com/uclaml/COPS}}$.

CoPS: Empowering LLM Agents with Provable Cross-Task Experience Sharing

TL;DR

CoPS introduces Cross-Task Experience Sharing, a general, pessimism-based algorithm to reuse and select cross-task experiences for LLM agents. It unifies offline and online settings through a memory bank and a decoder-driven distribution-matching objective, enabling efficient and robust sequential reasoning. Empirically, CoPS achieves state-of-the-art performance on Alfworld, Webshop, and HotPotQA across model scales, with notable sample efficiency advantages. Theoretical analysis ties performance to LLM pretraining quality and task-distribution alignment, offering guidance for design and deployment in diverse tasks.

Abstract

Sequential reasoning in agent systems has been significantly advanced by large language models (LLMs), yet existing approaches face limitations. Reflection-driven reasoning relies solely on knowledge in pretrained models, limiting performance in novel scenarios, while experience-assisted reasoning often depends on external experiences and lacks clear principles for selecting representative experiences. We address these limitations by proposing CoPS (Cross-Task Experience Sharing), a generalizable algorithm that enhances sequential reasoning by cross-task experience sharing and selection. In detail, CoPS leverages agents' experiences on previous tasks, selecting distribution-matched experiences via a provable pessimism-based strategy to maximize utility while minimizing risks from distribution shifts. Extensive experimental results on benchmarks like Alfworld, Webshop, and HotPotQA demonstrate that CoPS consistently outperforms state-of-the-art baselines, with superior sample efficiency suitable for resource-constrained scenarios. Theoretically, we show that the performance of our algorithm depends on both the quality of the pretrained LLM and the matching between the agent's task-dependent trial distribution and that generated by the LLM. Our work bridges the gap between existing sequential reasoning paradigms and validates the effectiveness of leveraging cross-task experiences, shedding light on the potential to improve agents' generalization and adaptability across diverse tasks. Our codes are available at .

Paper Structure

This paper contains 21 sections, 4 theorems, 29 equations, 3 figures, 5 tables, 3 algorithms.

Key Result

Theorem 5.4

By setting and denote $\mathbb{P}^{*,s} = \mathop{\mathrm{argmax}}_{{\widehat{\mathbb{P}}}\in\mathcal{P}}\mathbb{E}_{ {\mathcal{T}} \sim {\widehat{\mathbb{P}}}(\cdot|\mathcal{D}, s), a\sim \overline{{\text{Alg}^E}}(\cdot|{\mathcal{T}}, s)}r(s,a)$, we have the following bound with probability at least $1-2/T$:

Figures (3)

  • Figure 1: A brief illustration of CoPS, which fully leverages agents' cross-task experiences to enhance sequential reasoning by sharing and selecting distribution-matched experiences from previous task trajectories.
  • Figure 2: Comparative evaluation of CoPS, Reflexion, RAP, and LATS across three benchmarks: Alfworld, Webshop, and HotPotQA. The figures illustrate the success rates for both the smaller Llama 3.1 8b and larger Llama 3.1 70b models, averaged over 10 trials.
  • Figure 3: Performance impact of hyperparameters $c$ (scaling factor) and $k$ (number of in-context experiences) on the Alfworld benchmark for both Llama 3.1 8b and Llama 3.1 70b models.

Theorems & Definitions (5)

  • Definition 5.1: lin2023transformers
  • Theorem 5.4
  • Theorem 5.5
  • Lemma A.1: Lemma 20, lin2023transformers
  • Lemma A.2