Table of Contents
Fetching ...

Symbiotic Cooperation for Web Agents: Harnessing Complementary Strengths of Large and Small LLMs

Ruichen Zhang, Mufan Qiu, Zhen Tan, Mohan Zhang, Vincent Lu, Jie Peng, Kaidi Xu, Leandro Z. Agudelo, Peter Qian, Tianlong Chen

TL;DR

The paper introduces AgentSymbiotic, an iterative framework that couples data synthesis and distillation across large and small LLMs to improve web agents. Large LLMs generate high-quality trajectories and enrich a RAG knowledge base, while distilled small LLMs explore diverse trajectories and refine reasoning through multi-task learning and speculative data synthesis. A privacy-preserving hybrid mode directs sensitive steps to local LLMs, addressing user data concerns. On the WEBARENA benchmark, the approach achieves state-of-the-art results, with Claude-3.5 reaching 52.1% SR and 8B LLaMA distillations achieving 48.5–49% SR, significantly outperforming prior baselines. The work demonstrates the value of symbiotic cooperation between LLM scales for robust, efficient, and privacy-conscious web-agent intelligence.

Abstract

Web browsing agents powered by large language models (LLMs) have shown tremendous potential in automating complex web-based tasks. Existing approaches typically rely on large LLMs (e.g., GPT-4o) to explore web environments and generate trajectory data, which is then used either for demonstration retrieval (for large LLMs) or to distill small LLMs (e.g., Llama3) in a process that remains decoupled from the exploration. In this paper, we propose AgentSymbiotic, an iterative framework that couples data synthesis with task-performance, yielding a "symbiotic improvement" for both large and small LLMs. Our study uncovers a complementary dynamic between LLM types: while large LLMs excel at generating high-quality trajectories for distillation, the distilled small LLMs-owing to their distinct reasoning capabilities-often choose actions that diverge from those of their larger counterparts. This divergence drives the exploration of novel trajectories, thereby enriching the synthesized data. However, we also observe that the performance of small LLMs becomes a bottleneck in this iterative enhancement process. To address this, we propose two innovations in LLM distillation: a speculative data synthesis strategy that mitigates off-policy bias, and a multi-task learning approach designed to boost the reasoning capabilities of the student LLM. Furthermore, we introduce a Hybrid Mode for Privacy Preservation to address user privacy concerns. Evaluated on the WEBARENA benchmark, AgentSymbiotic achieves SOTA performance with both LLM types. Our best Large LLM agent reaches 52%, surpassing the previous best of 45%, while our 8B distilled model demonstrates a competitive 49%, exceeding the prior best of 28%. Code will be released upon acceptance.

Symbiotic Cooperation for Web Agents: Harnessing Complementary Strengths of Large and Small LLMs

TL;DR

The paper introduces AgentSymbiotic, an iterative framework that couples data synthesis and distillation across large and small LLMs to improve web agents. Large LLMs generate high-quality trajectories and enrich a RAG knowledge base, while distilled small LLMs explore diverse trajectories and refine reasoning through multi-task learning and speculative data synthesis. A privacy-preserving hybrid mode directs sensitive steps to local LLMs, addressing user data concerns. On the WEBARENA benchmark, the approach achieves state-of-the-art results, with Claude-3.5 reaching 52.1% SR and 8B LLaMA distillations achieving 48.5–49% SR, significantly outperforming prior baselines. The work demonstrates the value of symbiotic cooperation between LLM scales for robust, efficient, and privacy-conscious web-agent intelligence.

Abstract

Web browsing agents powered by large language models (LLMs) have shown tremendous potential in automating complex web-based tasks. Existing approaches typically rely on large LLMs (e.g., GPT-4o) to explore web environments and generate trajectory data, which is then used either for demonstration retrieval (for large LLMs) or to distill small LLMs (e.g., Llama3) in a process that remains decoupled from the exploration. In this paper, we propose AgentSymbiotic, an iterative framework that couples data synthesis with task-performance, yielding a "symbiotic improvement" for both large and small LLMs. Our study uncovers a complementary dynamic between LLM types: while large LLMs excel at generating high-quality trajectories for distillation, the distilled small LLMs-owing to their distinct reasoning capabilities-often choose actions that diverge from those of their larger counterparts. This divergence drives the exploration of novel trajectories, thereby enriching the synthesized data. However, we also observe that the performance of small LLMs becomes a bottleneck in this iterative enhancement process. To address this, we propose two innovations in LLM distillation: a speculative data synthesis strategy that mitigates off-policy bias, and a multi-task learning approach designed to boost the reasoning capabilities of the student LLM. Furthermore, we introduce a Hybrid Mode for Privacy Preservation to address user privacy concerns. Evaluated on the WEBARENA benchmark, AgentSymbiotic achieves SOTA performance with both LLM types. Our best Large LLM agent reaches 52%, surpassing the previous best of 45%, while our 8B distilled model demonstrates a competitive 49%, exceeding the prior best of 28%. Code will be released upon acceptance.

Paper Structure

This paper contains 29 sections, 3 equations, 7 figures, 2 tables, 2 algorithms.

Figures (7)

  • Figure 1: Illustration of the symbiotic improvement between small and large LLMs, where each of them benefits the other.
  • Figure 2: Overview of the AgentSymbiotic framework. Step 1: The large LLM interacts with the environment to generate high-quality trajectories, which are then used to distill small LLMs. Step 2: Multi-task learning and Speculative Data Synthesis are applied during distillation to enhance the reasoning capabilities of the small LLM and mitigate off-policy bias between the two LLMs. Step 3: The small LLM further explores the environment to produce diverse and valuable trajectories. Step 4: Then the knowledge base containing high-quality trajectories and comprehensive trajectories is incorporated into the large LLM's RAG process, improving its performance. This iterative process establishes a symbiotic improvement cycle, enhancing both large and small LLMs over time.
  • Figure 3: Overview of two key innovations in LLM distillation: (a) Speculative Data Synthesis, which mitigates off-policy bias by leveraging both large and small LLMs. At each step, the small LLM generates an action based on the observation, while the large LLM produces a set of top-$K$ action candidates. If the small LLM's action is within the large LLM's top-$K$ actions, it is accepted ($\checkmark$); otherwise, the large LLM's action is chosen for subsequent interactions (✗). (b) Multi-task Learning, which enhances reasoning capabilities by training small LLM to predict both actions and rationales, enabling it to handle multiple tasks and address missing reasoning capabilities during distillation. CoT indicates Chain-of-Thought wang2024chain.
  • Figure 4: The Privacy Detector analyzes each step's observation and action for private data. If detected, a local small LLM ensures confidentiality by predicting the next action and reason. Otherwise, a cloud-based large LLM handles predictions, leveraging its superior reasoning capabilities for non-sensitive tasks.
  • Figure 5: The synergy metric ($\Delta$), which defined in Equation \ref{['eq:synergy_iterative']}, increases as the iterative time progresses.
  • ...and 2 more figures