Table of Contents
Fetching ...

METEOR: Evolutionary Journey of Large Language Models from Guidance to Self-Growth

Jiawei Li, Xiaoang Xu, Yang Gao

TL;DR

METEOR tackles the challenge of producing cost-efficient domain-specific LLMs by evolving models from guided supervision to autonomous growth. It introduces a three-phase framework—weak-to-strong data distillation, iterative training with guided feedback, and self-evolution through inference-strategy optimization—to progressively expand domain capabilities. Empirical results on Stack Overflow-derived domains show substantial improvements across accuracy, completeness, relevance, coherence, and reliability, with GPT-4-based evaluation confirming gains. The approach aligns domain knowledge distributions between strong and weak models to enable efficient distillation and enables autonomous enhancement with minimal external supervision, offering practical benefits for domain-specific AI deployment.

Abstract

Model evolution enables learning from feedback to refine experiences and update skills, transforming models from having no domain knowledge to becoming domain experts. However, there is currently no unified and effective method for guiding this evolutionary process. To address this gap, we propose the Meteor method, which includes three training phases: weak-to-strong data distillation, iterative training, and self-evolution strategies. Each phase maximizes the model's inherent domain capabilities, allowing it to autonomously refine its domain knowledge and enhance performance. Experiments demonstrate that our approach significantly improves accuracy, completeness, relevance, coherence, and reliability across domain-specific tasks.

METEOR: Evolutionary Journey of Large Language Models from Guidance to Self-Growth

TL;DR

METEOR tackles the challenge of producing cost-efficient domain-specific LLMs by evolving models from guided supervision to autonomous growth. It introduces a three-phase framework—weak-to-strong data distillation, iterative training with guided feedback, and self-evolution through inference-strategy optimization—to progressively expand domain capabilities. Empirical results on Stack Overflow-derived domains show substantial improvements across accuracy, completeness, relevance, coherence, and reliability, with GPT-4-based evaluation confirming gains. The approach aligns domain knowledge distributions between strong and weak models to enable efficient distillation and enables autonomous enhancement with minimal external supervision, offering practical benefits for domain-specific AI deployment.

Abstract

Model evolution enables learning from feedback to refine experiences and update skills, transforming models from having no domain knowledge to becoming domain experts. However, there is currently no unified and effective method for guiding this evolutionary process. To address this gap, we propose the Meteor method, which includes three training phases: weak-to-strong data distillation, iterative training, and self-evolution strategies. Each phase maximizes the model's inherent domain capabilities, allowing it to autonomously refine its domain knowledge and enhance performance. Experiments demonstrate that our approach significantly improves accuracy, completeness, relevance, coherence, and reliability across domain-specific tasks.

Paper Structure

This paper contains 24 sections, 3 figures, 8 tables, 1 algorithm.

Figures (3)

  • Figure 1: Overview of the METEOR method, which is structured into three phases. Initially, weak-to-strong knowledge distillation is applied, wherein the distilled data is used to train the model, facilitating the initial acquisition of domain-specific capabilities. This is succeeded by iterative training, which further refines the model's domain expertise. Ultimately, self-training is conducted, enabling the model to achieve the proficiency of a domain expert.
  • Figure 2: Illustration of the weak-to-strong knowledge distillation process. Initially, a domain question is input into the domain model to obtain a guideline. The strong model then uses this guideline, provided by the weak model, along with the original question, to generate and distill domain-specific data.
  • Figure 3: Illustration of he iterative evolution process guided by a strong model. Upon receiving domain-specific data, the model employs CoT reasoning to generate answers and reasoning paths. These are evaluated by GPT-4, which provides confirmation if correct or offers suggestions for refinement if incorrect. This iterative process continues until the answer is validated or the maximum iteration limit is reached.