Table of Contents
Fetching ...

Semi-supervised Fine-tuning for Large Language Models

Junyu Luo, Xiao Luo, Xiusi Chen, Zhiping Xiao, Wei Ju, Ming Zhang

TL;DR

This work tackles the practical problem of fine-tuning large language models under hybrid data regimes, where labeled data are scarce and unlabeled data are plentiful. It introduces SemiEvol, a bi-level propagate-and-select framework that first propagates knowledge from labeled to unlabeled data via in-weight warm-up and in-context kNN retrieval, then mines unlabeled data through collaborative learning and adaptive selection using entropy-based confidence. By generating high-quality pseudo-responses from multiple LLMs and selectively incorporating them with labeled data, SemiEvol achieves consistent improvements over SFT and self-evolution across seven general and domain-specific tasks, including challenging domains like law and medicine. The framework supports continuous and iterative evolution, enabling practical deployment in real-world scenarios where unlabeled data accumulate over time. Overall, SemiEvol demonstrates a data-efficient path to improve LLM alignment and task performance in hybrid-data environments with measurable gains in reasoning, computation, and domain knowledge.

Abstract

Supervised fine-tuning (SFT) is crucial in adapting large language model (LLMs) to a specific domain or task. However, only a limited amount of labeled data is available in practical applications, which poses a severe challenge for SFT in yielding satisfactory results. Therefore, a data-efficient framework that can fully exploit labeled and unlabeled data for LLM fine-tuning is highly anticipated.Towards this end, we introduce a semi-supervised fine-tuning(SemiFT) task and a framework named SemiEvol for LLM alignment from a propagate-and-select manner. For knowledge propagation, SemiEvol adopts a bi-level approach, propagating knowledge from labeled data to unlabeled data through both in-weight and in-context methods. For knowledge selection, SemiEvol incorporates a collaborative learning mechanism, selecting higher-quality pseudo-response samples. We conducted experiments using GPT-4o-mini and Llama-3.1 on seven general or domain-specific datasets, demonstrating significant improvements in model performance on target data. Furthermore, we compared SemiEvol with SFT and self-evolution methods, highlighting its practicality in hybrid data scenarios.

Semi-supervised Fine-tuning for Large Language Models

TL;DR

This work tackles the practical problem of fine-tuning large language models under hybrid data regimes, where labeled data are scarce and unlabeled data are plentiful. It introduces SemiEvol, a bi-level propagate-and-select framework that first propagates knowledge from labeled to unlabeled data via in-weight warm-up and in-context kNN retrieval, then mines unlabeled data through collaborative learning and adaptive selection using entropy-based confidence. By generating high-quality pseudo-responses from multiple LLMs and selectively incorporating them with labeled data, SemiEvol achieves consistent improvements over SFT and self-evolution across seven general and domain-specific tasks, including challenging domains like law and medicine. The framework supports continuous and iterative evolution, enabling practical deployment in real-world scenarios where unlabeled data accumulate over time. Overall, SemiEvol demonstrates a data-efficient path to improve LLM alignment and task performance in hybrid-data environments with measurable gains in reasoning, computation, and domain knowledge.

Abstract

Supervised fine-tuning (SFT) is crucial in adapting large language model (LLMs) to a specific domain or task. However, only a limited amount of labeled data is available in practical applications, which poses a severe challenge for SFT in yielding satisfactory results. Therefore, a data-efficient framework that can fully exploit labeled and unlabeled data for LLM fine-tuning is highly anticipated.Towards this end, we introduce a semi-supervised fine-tuning(SemiFT) task and a framework named SemiEvol for LLM alignment from a propagate-and-select manner. For knowledge propagation, SemiEvol adopts a bi-level approach, propagating knowledge from labeled data to unlabeled data through both in-weight and in-context methods. For knowledge selection, SemiEvol incorporates a collaborative learning mechanism, selecting higher-quality pseudo-response samples. We conducted experiments using GPT-4o-mini and Llama-3.1 on seven general or domain-specific datasets, demonstrating significant improvements in model performance on target data. Furthermore, we compared SemiEvol with SFT and self-evolution methods, highlighting its practicality in hybrid data scenarios.

Paper Structure

This paper contains 32 sections, 9 equations, 7 figures, 7 tables, 1 algorithm.

Figures (7)

  • Figure 1: Comparison of SemiEvol with previous SFT methods. SemiEvol enables interaction between diverse data types for superior performance evolution.
  • Figure 2: Overview of SemiEvol. It maximizes the utility of labeled data through a bi-level knowledge propagation-and-selection framework, while leveraging collaborative learning among multiple LLMs to exploit unlabeled data, thereby unleashing the full data potential.
  • Figure 3: Sensitivity analysis of SemiEvol's performance under different $n$ and $\theta$ on variant datasets.
  • Figure 4: Entropy distribution indicates SemiEvol can enhanced response confidence. Lower entropy values indicate more confident predictions.
  • Figure 5: Stability analysis via mean performance and standard deviation across multiple inference prompts.
  • ...and 2 more figures