Table of Contents
Fetching ...

MCCE: A Framework for Multi-LLM Collaborative Co-Evolution

Nian Ran, Zhongzheng Li, Yue Wang, Qingsong Ran, Xiaoyuan Zhang, Shikun Feng, Richard Allmendinger, Xiaoguang Zhao

TL;DR

The paper tackles the challenge of discrete, multi-objective optimization in vast search spaces by introducing Multi-LLM Collaborative Co-evolution (MCCE), which pairs a frozen, high-capacity API LLM with a lightweight trainable model in a closed-loop co-evolution. A memory of breakthrough trajectories guides the trainable model through experience-driven learning using Direct Preference Optimization, while the fixed LLM maintains global exploration, enabling continual improvement through mutual inspiration. Core contributions include the MCCE framework, an experience-driven learning paradigm with similarity-based data synthesis to stabilize training, and empirical demonstration of state-of-the-art Pareto front quality in five-objective molecular design. The results suggest a general and scalable paradigm for hybrid LLM systems that combine knowledge-driven exploration with experience-driven adaptation across complex discrete optimization domains.

Abstract

Multi-objective discrete optimization problems, such as molecular design, pose significant challenges due to their vast and unstructured combinatorial spaces. Traditional evolutionary algorithms often get trapped in local optima, while expert knowledge can provide crucial guidance for accelerating convergence. Large language models (LLMs) offer powerful priors and reasoning ability, making them natural optimizers when expert knowledge matters. However, closed-source LLMs, though strong in exploration, cannot update their parameters and thus cannot internalize experience. Conversely, smaller open models can be continually fine-tuned but lack broad knowledge and reasoning strength. We introduce Multi-LLM Collaborative Co-evolution (MCCE), a hybrid framework that unites a frozen closed-source LLM with a lightweight trainable model. The system maintains a trajectory memory of past search processes; the small model is progressively refined via reinforcement learning, with the two models jointly supporting and complementing each other in global exploration. Unlike model distillation, this process enhances the capabilities of both models through mutual inspiration. Experiments on multi-objective drug design benchmarks show that MCCE achieves state-of-the-art Pareto front quality and consistently outperforms baselines. These results highlight a new paradigm for enabling continual evolution in hybrid LLM systems, combining knowledge-driven exploration with experience-driven learning.

MCCE: A Framework for Multi-LLM Collaborative Co-Evolution

TL;DR

The paper tackles the challenge of discrete, multi-objective optimization in vast search spaces by introducing Multi-LLM Collaborative Co-evolution (MCCE), which pairs a frozen, high-capacity API LLM with a lightweight trainable model in a closed-loop co-evolution. A memory of breakthrough trajectories guides the trainable model through experience-driven learning using Direct Preference Optimization, while the fixed LLM maintains global exploration, enabling continual improvement through mutual inspiration. Core contributions include the MCCE framework, an experience-driven learning paradigm with similarity-based data synthesis to stabilize training, and empirical demonstration of state-of-the-art Pareto front quality in five-objective molecular design. The results suggest a general and scalable paradigm for hybrid LLM systems that combine knowledge-driven exploration with experience-driven adaptation across complex discrete optimization domains.

Abstract

Multi-objective discrete optimization problems, such as molecular design, pose significant challenges due to their vast and unstructured combinatorial spaces. Traditional evolutionary algorithms often get trapped in local optima, while expert knowledge can provide crucial guidance for accelerating convergence. Large language models (LLMs) offer powerful priors and reasoning ability, making them natural optimizers when expert knowledge matters. However, closed-source LLMs, though strong in exploration, cannot update their parameters and thus cannot internalize experience. Conversely, smaller open models can be continually fine-tuned but lack broad knowledge and reasoning strength. We introduce Multi-LLM Collaborative Co-evolution (MCCE), a hybrid framework that unites a frozen closed-source LLM with a lightweight trainable model. The system maintains a trajectory memory of past search processes; the small model is progressively refined via reinforcement learning, with the two models jointly supporting and complementing each other in global exploration. Unlike model distillation, this process enhances the capabilities of both models through mutual inspiration. Experiments on multi-objective drug design benchmarks show that MCCE achieves state-of-the-art Pareto front quality and consistently outperforms baselines. These results highlight a new paradigm for enabling continual evolution in hybrid LLM systems, combining knowledge-driven exploration with experience-driven learning.

Paper Structure

This paper contains 28 sections, 19 equations, 5 figures, 1 table, 2 algorithms.

Figures (5)

  • Figure 1: Overview of the proposed MCCE framework. The system begins with user interaction and population initialization based on the problem definition and evaluation criteria. In the candidate generation stage, a frozen API-based LLM and a trainable local LLM collaborate to propose new molecules. These are evaluated by the multi-objective evaluation module, which applies Pareto selection to maintain a balanced population, while breakthrough solutions are stored as experience. In the update and learning stage, similarity-based data synthesis constructs preference pairs from past trajectories, and the local model is refined via DPO training. This creates a self-improving feedback loop where global exploration (API LLM) and local adaptation (trainable LLM) co-evolve toward progressively optimized solutions.
  • Figure 2: Overall performance comparison across different baselines.(Left) The curve of avg_top1 (mean $\pm$ std) shows that our DPO-enhanced co-evolutionary framework consistently outperforms all baselines, steadily increasing the average quality of the top-ranked molecule throughout the optimization process.(Right) The curve of hypervolume (mean $\pm$ std) further highlights the superiority of our approach: MCCE with DPO training achieves the largest Pareto front coverage, demonstrating both improved solution quality and diversity.In both metrics, our method significantly surpasses single-model baselines (e.g., Qwen2.5-7B-Instruct, GPT-4o-2024-05-13) as well as alternative co-evolution variants (SFT and RL), achieving state-of-the-art performance.
  • Figure 3: (Left) The co-evolutionary curve showing how the large LLM and local model complement each other to achieve superior trajectories. (Right) Output distribution analysis of molecules generated from the frozen LLM, the initial local model, and the fine-tuned local model.
  • Figure 4: loss Analysis
  • Figure 5: Additional Co-evolutionary Curves