Table of Contents
Fetching ...

RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response

Junyu Luo, Xiao Luo, Kaize Ding, Jingyang Yuan, Zhiping Xiao, Ming Zhang

TL;DR

RobustFT tackles the pervasive problem of noisy data in supervised fine-tuning of large language models by combining multi-expert noise detection with context-enhanced denoising and entropy-based sample selection. The framework identifies potentially noisy samples via a collaboration of reasoning-enabled LLMs and a consistency checker, then relabels and screens them through a context-informed Review Agent to produce a denoised fine-tuning dataset. Across five diverse benchmarks and multiple model backbones, RobustFT consistently outperforms baselines, with ablations revealing the critical roles of its detection, denoising, and selection components. The work demonstrates that robust data curation during SFT markedly improves downstream task performance and stability, especially for smaller models, underscoring its practical importance for real-world LLM deployment.

Abstract

Supervised fine-tuning (SFT) plays a crucial role in adapting large language models (LLMs) to specific domains or tasks. However, as demonstrated by empirical experiments, the collected data inevitably contains noise in practical applications, which poses significant challenges to model performance on downstream tasks. Therefore, there is an urgent need for a noise-robust SFT framework to enhance model capabilities in downstream tasks. To address this challenge, we introduce a robust SFT framework (RobustFT) that performs noise detection and relabeling on downstream task data. For noise identification, our approach employs a multi-expert collaborative system with inference-enhanced models to achieve superior noise detection. In the denoising phase, we utilize a context-enhanced strategy, which incorporates the most relevant and confident knowledge followed by careful assessment to generate reliable annotations. Additionally, we introduce an effective data selection mechanism based on response entropy, ensuring only high-quality samples are retained for fine-tuning. Extensive experiments conducted on multiple LLMs across five datasets demonstrate RobustFT's exceptional performance in noisy scenarios.

RobustFT: Robust Supervised Fine-tuning for Large Language Models under Noisy Response

TL;DR

RobustFT tackles the pervasive problem of noisy data in supervised fine-tuning of large language models by combining multi-expert noise detection with context-enhanced denoising and entropy-based sample selection. The framework identifies potentially noisy samples via a collaboration of reasoning-enabled LLMs and a consistency checker, then relabels and screens them through a context-informed Review Agent to produce a denoised fine-tuning dataset. Across five diverse benchmarks and multiple model backbones, RobustFT consistently outperforms baselines, with ablations revealing the critical roles of its detection, denoising, and selection components. The work demonstrates that robust data curation during SFT markedly improves downstream task performance and stability, especially for smaller models, underscoring its practical importance for real-world LLM deployment.

Abstract

Supervised fine-tuning (SFT) plays a crucial role in adapting large language models (LLMs) to specific domains or tasks. However, as demonstrated by empirical experiments, the collected data inevitably contains noise in practical applications, which poses significant challenges to model performance on downstream tasks. Therefore, there is an urgent need for a noise-robust SFT framework to enhance model capabilities in downstream tasks. To address this challenge, we introduce a robust SFT framework (RobustFT) that performs noise detection and relabeling on downstream task data. For noise identification, our approach employs a multi-expert collaborative system with inference-enhanced models to achieve superior noise detection. In the denoising phase, we utilize a context-enhanced strategy, which incorporates the most relevant and confident knowledge followed by careful assessment to generate reliable annotations. Additionally, we introduce an effective data selection mechanism based on response entropy, ensuring only high-quality samples are retained for fine-tuning. Extensive experiments conducted on multiple LLMs across five datasets demonstrate RobustFT's exceptional performance in noisy scenarios.

Paper Structure

This paper contains 29 sections, 9 equations, 6 figures, 3 tables, 1 algorithm.

Figures (6)

  • Figure 1: Impact of noisy data on LLM performance during SFT. Increasing noise levels deteriorates model performance, highlighting the critical need for noise-robust fine-tuning approaches.
  • Figure 2: Overview of RobustFT. Our RobustFT enhances model performance through a two-stage noise detection-and-denoising framework, leveraging collaborative learning among expert LLMs for noise detection and context-enhanced reasoning for data denoising, ultimately enabling robust downstream fine-tuning.
  • Figure 3: Sensitivity analysis on MMLU under different $\beta$ and $k$ with varying noise levels.
  • Figure 4: Perplexity analysis of RobustFT on MMLU and ARC with varying noise levels.
  • Figure 5: Category-wise performance of RobustFT.
  • ...and 1 more figures