Table of Contents
Fetching ...

The Single-Multi Evolution Loop for Self-Improving Model Collaboration Systems

Shangbin Feng, Kishan Panaganti, Yulia Tsvetkov, Wenhao Yu

TL;DR

The paper tackles the high cost of multi-LLM collaboration by distilling collaborative outputs into a single language model, enabling inference with a single model while preserving collaboration benefits. It introduces the single–multi evolution loop, which alternates multi-step collaboration and single-step distillation, and extends it with iterative post-distillation collaboration to foster continual self-improvement among LLMs. Through extensive experiments across 7 collaboration strategies, 3 distillation methods, and 15 tasks, the approach yields on average $8.0\%$ gains for individual models and $14.9\%$ gains for the collaborative system, outperforming existing evolutionary AI methods by about $7.7\%$ on average. The work demonstrates strong improvements in reasoning and knowledge tasks, showcases the robustness of the method across model pools and collaboration modes, and highlights practical implications for scalable, self-improving AI ecosystems, while also noting safety considerations for real-world deployment.

Abstract

Model collaboration -- systems where multiple language models (LMs) collaborate -- combines the strengths of diverse models with cost in loading multiple LMs. We improve efficiency while preserving the strengths of collaboration by distilling collaborative patterns into a single model, where the model is trained on the outputs of the model collaboration system. At inference time, only the distilled model is employed: it imitates the collaboration while only incurring the cost of a single model. Furthermore, we propose the single-multi evolution loop: multiple LMs collaborate, each distills from the collaborative outputs, and these post-distillation improved LMs collaborate again, forming a collective evolution ecosystem where models evolve and self-improve by interacting with an environment of other models. Extensive experiments with 7 collaboration strategies and 15 tasks (QA, reasoning, factuality, etc.) demonstrate that: 1) individual models improve by 8.0% on average, absorbing the strengths of collaboration while reducing the cost to a single model; 2) the collaboration also benefits from the stronger and more synergistic LMs after distillation, improving over initial systems without evolution by 14.9% on average. Analysis reveals that the single-multi evolution loop outperforms various existing evolutionary AI methods, is compatible with diverse model/collaboration/distillation settings, and helps solve problems where the initial model/system struggles to.

The Single-Multi Evolution Loop for Self-Improving Model Collaboration Systems

TL;DR

The paper tackles the high cost of multi-LLM collaboration by distilling collaborative outputs into a single language model, enabling inference with a single model while preserving collaboration benefits. It introduces the single–multi evolution loop, which alternates multi-step collaboration and single-step distillation, and extends it with iterative post-distillation collaboration to foster continual self-improvement among LLMs. Through extensive experiments across 7 collaboration strategies, 3 distillation methods, and 15 tasks, the approach yields on average gains for individual models and gains for the collaborative system, outperforming existing evolutionary AI methods by about on average. The work demonstrates strong improvements in reasoning and knowledge tasks, showcases the robustness of the method across model pools and collaboration modes, and highlights practical implications for scalable, self-improving AI ecosystems, while also noting safety considerations for real-world deployment.

Abstract

Model collaboration -- systems where multiple language models (LMs) collaborate -- combines the strengths of diverse models with cost in loading multiple LMs. We improve efficiency while preserving the strengths of collaboration by distilling collaborative patterns into a single model, where the model is trained on the outputs of the model collaboration system. At inference time, only the distilled model is employed: it imitates the collaboration while only incurring the cost of a single model. Furthermore, we propose the single-multi evolution loop: multiple LMs collaborate, each distills from the collaborative outputs, and these post-distillation improved LMs collaborate again, forming a collective evolution ecosystem where models evolve and self-improve by interacting with an environment of other models. Extensive experiments with 7 collaboration strategies and 15 tasks (QA, reasoning, factuality, etc.) demonstrate that: 1) individual models improve by 8.0% on average, absorbing the strengths of collaboration while reducing the cost to a single model; 2) the collaboration also benefits from the stronger and more synergistic LMs after distillation, improving over initial systems without evolution by 14.9% on average. Analysis reveals that the single-multi evolution loop outperforms various existing evolutionary AI methods, is compatible with diverse model/collaboration/distillation settings, and helps solve problems where the initial model/system struggles to.
Paper Structure (29 sections, 4 equations, 4 figures, 7 tables, 1 algorithm)

This paper contains 29 sections, 4 equations, 4 figures, 7 tables, 1 algorithm.

Figures (4)

  • Figure 1: We propose the single-multi evolution loop: In the multi-step, multiple language models collaborate via model collaboration algorithm $\mathcal{C}$ to generate better responses; In the single-step, we employ knowledge distillation, where each individual LM is the student and the model collaboration system is the teacher. By alternating and iteratively executing the multi-step and single-step, multiple LLMs collaboratively evolve for better models and better model collaboration systems.
  • Figure 2: Comparing the single-multi evolution loop with existing evolution strategies. Our strategy consistently outperforms the three methods by 7.7% on average.
  • Figure 3: How does the skills dynamic change through evolution iterations, i.e. how many problems could be solved by at least one model individually or the multi-LLM collaboration system. Results indicate better and more synergistic models and systems.
  • Figure 4: The performance of single individual models and multi-model systems across iterations, with a pool of three different sizes of Qwen-2.5. We employ multi-student KD and the BLEND dataset for evaluation.