Self-Evolving LLMs via Continual Instruction Tuning
Jiazheng Kang, Le Huang, Cheng Hou, Zhe Zhao, Zhenxiang Yan, Ting Bai
TL;DR
MoE-CL tackles catastrophic forgetting in self-evolving LLMs by pairing task-specific LoRA experts with a shared LoRA expert, and enforcing cross-task alignment through a GAN-based task-aware discriminator. This adversarial Mixture-of-LoRA-Experts framework preserves task-specific knowledge while enabling transfer via a shared representation, enabling autonomous adaptation across sequential tasks. Empirical results on MTL5 and Tencent3 benchmarks, plus offline A/B tests on Tencent CCR, show superior accuracy, stability, and tangible business impact, including reduced manual review costs. The work demonstrates a practical path to industrial-scale self-evolution of LLMs, balancing retention and generalization despite diverse task distributions.
Abstract
In real-world industrial settings, large language models (LLMs) must learn continually to keep pace with diverse and evolving tasks, requiring self-evolution to refine knowledge under dynamic data distributions. However, existing continual learning (CL) approaches, such as replay and parameter isolation, often suffer from catastrophic forgetting: training on new tasks degrades performance on earlier ones by overfitting to the new distribution and weakening generalization.We propose MoE-CL, a parameter-efficient adversarial mixture-of-experts framework for industrial-scale, self-evolving continual instruction tuning of LLMs. MoE-CL uses a dual-expert design: (1) a dedicated LoRA expert per task to preserve task-specific knowledge via parameter independence, mitigating forgetting; and (2) a shared LoRA expert to enable cross-task transfer. To prevent transferring task-irrelevant noise through the shared pathway, we integrate a task-aware discriminator within a GAN. The discriminator encourages the shared expert to pass only task-aligned information during sequential training. Through adversarial learning, the shared expert acquires generalized representations that mimic the discriminator, while dedicated experts retain task-specific details, balancing knowledge retention and cross-task generalization and thereby supporting self-evolution.Extensive experiments on the public MTL5 benchmark and an industrial Tencent3 benchmark validate the effectiveness of MoE-CL for continual instruction tuning. In real-world A/B testing for content compliance review on the Tencent Video platform, MoE-CL reduced manual review costs by 15.3%. These results demonstrate that MoE-CL is practical for large-scale industrial deployment where continual adaptation and stable transfer are critical.
