The Single-Multi Evolution Loop for Self-Improving Model Collaboration Systems
Shangbin Feng, Kishan Panaganti, Yulia Tsvetkov, Wenhao Yu
TL;DR
The paper tackles the high cost of multi-LLM collaboration by distilling collaborative outputs into a single language model, enabling inference with a single model while preserving collaboration benefits. It introduces the single–multi evolution loop, which alternates multi-step collaboration and single-step distillation, and extends it with iterative post-distillation collaboration to foster continual self-improvement among LLMs. Through extensive experiments across 7 collaboration strategies, 3 distillation methods, and 15 tasks, the approach yields on average $8.0\%$ gains for individual models and $14.9\%$ gains for the collaborative system, outperforming existing evolutionary AI methods by about $7.7\%$ on average. The work demonstrates strong improvements in reasoning and knowledge tasks, showcases the robustness of the method across model pools and collaboration modes, and highlights practical implications for scalable, self-improving AI ecosystems, while also noting safety considerations for real-world deployment.
Abstract
Model collaboration -- systems where multiple language models (LMs) collaborate -- combines the strengths of diverse models with cost in loading multiple LMs. We improve efficiency while preserving the strengths of collaboration by distilling collaborative patterns into a single model, where the model is trained on the outputs of the model collaboration system. At inference time, only the distilled model is employed: it imitates the collaboration while only incurring the cost of a single model. Furthermore, we propose the single-multi evolution loop: multiple LMs collaborate, each distills from the collaborative outputs, and these post-distillation improved LMs collaborate again, forming a collective evolution ecosystem where models evolve and self-improve by interacting with an environment of other models. Extensive experiments with 7 collaboration strategies and 15 tasks (QA, reasoning, factuality, etc.) demonstrate that: 1) individual models improve by 8.0% on average, absorbing the strengths of collaboration while reducing the cost to a single model; 2) the collaboration also benefits from the stronger and more synergistic LMs after distillation, improving over initial systems without evolution by 14.9% on average. Analysis reveals that the single-multi evolution loop outperforms various existing evolutionary AI methods, is compatible with diverse model/collaboration/distillation settings, and helps solve problems where the initial model/system struggles to.
