Table of Contents
Fetching ...

CCA: Collaborative Competitive Agents for Image Editing

Tiankai Hang, Shuyang Gu, Dong Chen, Xin Geng, Baining Guo

TL;DR

The paper presents Collaborative Competitive Agents (CCA), a multi-agent framework for image editing that pairs two equal-status generator agents with a discriminator to decompose complex user instructions into subtasks, execute them with a library of tools, and iteratively refine results through feedback. By exposing intermediate steps and enabling cross-agent learning, CCA achieves robust handling of intricate edits beyond single-tool or single-model approaches. The framework formalizes planning, execution, and feedback loops, including a hierarchical tool configuration and a quality competitor for early stopping, and demonstrates effectiveness through extensive experiments and ablation studies. The work highlights the value of collaborative competition among agents and suggests broad applicability beyond editing, with code and tools released to support reproducibility and further research.

Abstract

This paper presents a novel generative model, Collaborative Competitive Agents (CCA), which leverages the capabilities of multiple Large Language Models (LLMs) based agents to execute complex tasks. Drawing inspiration from Generative Adversarial Networks (GANs), the CCA system employs two equal-status generator agents and a discriminator agent. The generators independently process user instructions and generate results, while the discriminator evaluates the outputs, and provides feedback for the generator agents to further reflect and improve the generation results. Unlike the previous generative model, our system can obtain the intermediate steps of generation. This allows each generator agent to learn from other successful executions due to its transparency, enabling a collaborative competition that enhances the quality and robustness of the system's results. The primary focus of this study is image editing, demonstrating the CCA's ability to handle intricate instructions robustly. The paper's main contributions include the introduction of a multi-agent-based generative model with controllable intermediate steps and iterative optimization, a detailed examination of agent relationships, and comprehensive experiments on image editing. Code is available at \href{https://github.com/TiankaiHang/CCA}{https://github.com/TiankaiHang/CCA}.

CCA: Collaborative Competitive Agents for Image Editing

TL;DR

The paper presents Collaborative Competitive Agents (CCA), a multi-agent framework for image editing that pairs two equal-status generator agents with a discriminator to decompose complex user instructions into subtasks, execute them with a library of tools, and iteratively refine results through feedback. By exposing intermediate steps and enabling cross-agent learning, CCA achieves robust handling of intricate edits beyond single-tool or single-model approaches. The framework formalizes planning, execution, and feedback loops, including a hierarchical tool configuration and a quality competitor for early stopping, and demonstrates effectiveness through extensive experiments and ablation studies. The work highlights the value of collaborative competition among agents and suggests broad applicability beyond editing, with code and tools released to support reproducibility and further research.

Abstract

This paper presents a novel generative model, Collaborative Competitive Agents (CCA), which leverages the capabilities of multiple Large Language Models (LLMs) based agents to execute complex tasks. Drawing inspiration from Generative Adversarial Networks (GANs), the CCA system employs two equal-status generator agents and a discriminator agent. The generators independently process user instructions and generate results, while the discriminator evaluates the outputs, and provides feedback for the generator agents to further reflect and improve the generation results. Unlike the previous generative model, our system can obtain the intermediate steps of generation. This allows each generator agent to learn from other successful executions due to its transparency, enabling a collaborative competition that enhances the quality and robustness of the system's results. The primary focus of this study is image editing, demonstrating the CCA's ability to handle intricate instructions robustly. The paper's main contributions include the introduction of a multi-agent-based generative model with controllable intermediate steps and iterative optimization, a detailed examination of agent relationships, and comprehensive experiments on image editing. Code is available at \href{https://github.com/TiankaiHang/CCA}{https://github.com/TiankaiHang/CCA}.
Paper Structure (25 sections, 5 equations, 12 figures, 3 tables, 1 algorithm)

This paper contains 25 sections, 5 equations, 12 figures, 3 tables, 1 algorithm.

Figures (12)

  • Figure 1: The framework of our Collaborative Competitive Agents system. Through providing feedback, the discriminator agent encourages the generator agent to engage in both collaborative learning and competition. The system's performance undergoes iterative optimization to effectively meet user requirements.
  • Figure 2: The example user requirements is: "Change the dog to corgi and transform the image to pixel style". Yes/No questions can achieve more effective feedback.
  • Figure 3: The example of the effect of the hierarchical tool setting. The given user request is "Enrich wooden frames to the photo and adjust the longer side to 512".
  • Figure 4: Qualitative comparison between InstructPix2Pix brooks2023instructpix2pix, MagicBrush Zhang2023MagicBrush, InstructDiffusion Geng23instructdiff, VisProg gupta2023visual, and ours.
  • Figure 5: Comparison of single tool vs. multiple tools. Prompt: Replace the house with a wooden one, and turn the stones in front of the house into flowers.
  • ...and 7 more figures