Table of Contents
Fetching ...

CoWork-X: Experience-Optimized Co-Evolution for Multi-Agent Collaboration System

Zexin Lin, Jiachen Yu, Haoyang Zhang, Yuzhao Li, Zhonghang Li, Yujiu Yang, Junjie Wang, Xiaoqiang Ji

TL;DR

CoWork-X tackles real-time coordination and continual cross-episode adaptation under strict online budgets by introducing an Execute--Optimize loop that separates fast, HTN-based execution from offline, patch-style skill updates. A shared HTN skill library $\,\mathcal{S}_k$ is incrementally improved by a post-episode Co-Optimizer using episode logs, enabling stable multi-agent co-evolution. In Overcooked-like benchmarks, CoWork-X achieves sustained gains with zero online tokens and markedly lower latency ($\approx$ $2.6$ s per episode) compared to baselines that rely on frequent in-episode LLM reasoning, while generalizing across multiple LLM backbones. The work demonstrates practical, scalable cross-episode collaboration and highlights the value of log-grounded, verifier-driven skill consolidation for real-time multi-agent systems.

Abstract

Large language models are enabling language-conditioned agents in interactive environments, but highly cooperative tasks often impose two simultaneous constraints: sub-second real-time coordination and sustained multi-episode adaptation under a strict online token budget. Existing approaches either rely on frequent in-episode reasoning that induces latency and timing jitter, or deliver post-episode improvements through unstructured text that is difficult to compile into reliable low-cost execution. We propose CoWork-X, an active co-evolution framework that casts peer collaboration as a closed-loop optimization problem across episodes, inspired by fast--slow memory separation. CoWork-X instantiates a Skill-Agent that executes via HTN (hierarchical task network)-based skill retrieval from a structured, interpretable, and compositional skill library, and a post-episode Co-Optimizer that performs patch-style skill consolidation with explicit budget constraints and drift regularization. Experiments in challenging Overcooked-AI-like realtime collaboration benchmarks demonstrate that CoWork-X achieves stable, cumulative performance gains while steadily reducing online latency and token usage.

CoWork-X: Experience-Optimized Co-Evolution for Multi-Agent Collaboration System

TL;DR

CoWork-X tackles real-time coordination and continual cross-episode adaptation under strict online budgets by introducing an Execute--Optimize loop that separates fast, HTN-based execution from offline, patch-style skill updates. A shared HTN skill library is incrementally improved by a post-episode Co-Optimizer using episode logs, enabling stable multi-agent co-evolution. In Overcooked-like benchmarks, CoWork-X achieves sustained gains with zero online tokens and markedly lower latency ( s per episode) compared to baselines that rely on frequent in-episode LLM reasoning, while generalizing across multiple LLM backbones. The work demonstrates practical, scalable cross-episode collaboration and highlights the value of log-grounded, verifier-driven skill consolidation for real-time multi-agent systems.

Abstract

Large language models are enabling language-conditioned agents in interactive environments, but highly cooperative tasks often impose two simultaneous constraints: sub-second real-time coordination and sustained multi-episode adaptation under a strict online token budget. Existing approaches either rely on frequent in-episode reasoning that induces latency and timing jitter, or deliver post-episode improvements through unstructured text that is difficult to compile into reliable low-cost execution. We propose CoWork-X, an active co-evolution framework that casts peer collaboration as a closed-loop optimization problem across episodes, inspired by fast--slow memory separation. CoWork-X instantiates a Skill-Agent that executes via HTN (hierarchical task network)-based skill retrieval from a structured, interpretable, and compositional skill library, and a post-episode Co-Optimizer that performs patch-style skill consolidation with explicit budget constraints and drift regularization. Experiments in challenging Overcooked-AI-like realtime collaboration benchmarks demonstrate that CoWork-X achieves stable, cumulative performance gains while steadily reducing online latency and token usage.
Paper Structure (28 sections, 11 figures, 5 tables)

This paper contains 28 sections, 11 figures, 5 tables.

Figures (11)

  • Figure 1: CoWork-X overview. Skill-Agents execute via a shared skill library, and an LLM Co-Optimizer updates it from episode logs for closed-loop co-evolution.
  • Figure 2: CoWork-X Execute--Optimize loop. A Skill-Agent executes an HTN policy from $\mathcal{S}_k$, then an LLM Co-Optimizer diagnoses episode logs and patches $\mathcal{S}_k\!\to\!\mathcal{S}_{k+1}$ (e.g., adding preconditions), improving performance in subsequent episodes.
  • Figure 3: Performance across 30 episodes. CoWork-X shows consistent improvement across iterations. Baselines show unstable performance from frequent online LLM calls.
  • Figure 4: Score efficiency. CoWork-X achieves higher score-per-resource ratios: 0.92 score/s and 5.9 score/1k tokens, versus DPT-WToM's 0.09 and 0.20.
  • Figure 5: Ablation study of different families of models. Each model was tested in 5 independent runs, each with 10 iterations.
  • ...and 6 more figures