Table of Contents
Fetching ...

Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning

Zinan Tang, Xin Gao, Qizhi Pei, Zhuoshi Pan, Mengzhang Cai, Jiang Wu, Conghui He, Lijun Wu

TL;DR

This work addresses the dependence of LLM fine-tuning on high-quality data and the inadequacy of static data curation. It proposes Middo, a closed-loop, model-informed dynamic data optimization framework that uses tri-axial signals—loss patterns for complexity, embedding cluster dynamics for diversity, and self-alignment scores for quality—to adapt seed data in tandem with model progress. The methodology integrates complexity refinement, diversity augmentation, and quality improvement into an iterative loop, preserving dataset size while enhancing learning value. Across multiple benchmarks and base models, Middo yields consistent accuracy gains (e.g., $7.15\%$ on Alpaca with LLaMA-3.1-8B) and demonstrates robust data-quality improvements without increasing data volume, signaling a shift toward sustainable data-model co-evolution in LLM training.

Abstract

Supervised Fine-Tuning (SFT) Large Language Models (LLM) fundamentally rely on high-quality training data. While data selection and data synthesis are two common strategies to improve data quality, existing approaches often face limitations in static dataset curation that fail to adapt to evolving model capabilities. In this paper, we introduce Middo, a self-evolving Model-informed dynamic data optimization framework that uses model-aware data selection and context-preserving data refinement. Unlike conventional one-off filtering/synthesis methods, our framework establishes a closed-loop optimization system: (1) A self-referential diagnostic module proactively identifies suboptimal samples through tri-axial model signals - loss patterns (complexity), embedding cluster dynamics (diversity), and self-alignment scores (quality); (2) An adaptive optimization engine then transforms suboptimal samples into pedagogically valuable training points while preserving semantic integrity; (3) This optimization process continuously evolves with model capability through dynamic learning principles. Experiments on multiple benchmarks demonstrate that our Middo consistently enhances the quality of seed data and boosts LLM's performance with improving accuracy by 7.15% on average while maintaining the original dataset scale. This work establishes a new paradigm for sustainable LLM training through dynamic human-AI co-evolution of data and models. Our datasets, models, and code are publicly available at https://github.com/Word2VecT/Middo.

Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning

TL;DR

This work addresses the dependence of LLM fine-tuning on high-quality data and the inadequacy of static data curation. It proposes Middo, a closed-loop, model-informed dynamic data optimization framework that uses tri-axial signals—loss patterns for complexity, embedding cluster dynamics for diversity, and self-alignment scores for quality—to adapt seed data in tandem with model progress. The methodology integrates complexity refinement, diversity augmentation, and quality improvement into an iterative loop, preserving dataset size while enhancing learning value. Across multiple benchmarks and base models, Middo yields consistent accuracy gains (e.g., on Alpaca with LLaMA-3.1-8B) and demonstrates robust data-quality improvements without increasing data volume, signaling a shift toward sustainable data-model co-evolution in LLM training.

Abstract

Supervised Fine-Tuning (SFT) Large Language Models (LLM) fundamentally rely on high-quality training data. While data selection and data synthesis are two common strategies to improve data quality, existing approaches often face limitations in static dataset curation that fail to adapt to evolving model capabilities. In this paper, we introduce Middo, a self-evolving Model-informed dynamic data optimization framework that uses model-aware data selection and context-preserving data refinement. Unlike conventional one-off filtering/synthesis methods, our framework establishes a closed-loop optimization system: (1) A self-referential diagnostic module proactively identifies suboptimal samples through tri-axial model signals - loss patterns (complexity), embedding cluster dynamics (diversity), and self-alignment scores (quality); (2) An adaptive optimization engine then transforms suboptimal samples into pedagogically valuable training points while preserving semantic integrity; (3) This optimization process continuously evolves with model capability through dynamic learning principles. Experiments on multiple benchmarks demonstrate that our Middo consistently enhances the quality of seed data and boosts LLM's performance with improving accuracy by 7.15% on average while maintaining the original dataset scale. This work establishes a new paradigm for sustainable LLM training through dynamic human-AI co-evolution of data and models. Our datasets, models, and code are publicly available at https://github.com/Word2VecT/Middo.

Paper Structure

This paper contains 52 sections, 13 figures, 10 tables.

Figures (13)

  • Figure 1: Comparison of different dataset and different models before and after Middo optimization.
  • Figure 2: The Middo pipeline: a closed-loop, iterative dynamic optimization framework for LLM fine-tuning. It comprises three core modules that leverage model feedback: Loss Patterns identify overly complex samples, which are then simplified; Self-alignment Scores evaluate data quality, transforming low-quality samples into high-quality ones; and Embedding Cluster Dynamics detect sparse data points and expand the data distribution through targeted augmentation. Middo ensure the training set continually evolves to better align with the model’s capabilities.
  • Figure 3: Performance comparison of Middo on the Alpaca dataset with varying refined data sizes. The x-axis represents the number and percentage of data selected for refinement, while the y-axis shows the average accuracy across three iterations. To ensure fairness, we guarantee that the data after refinement is the same.
  • Figure 4: Loss distribution comparison before and after applying Middo. The density curve reflects the relative frequency of data points within specific loss intervals. The inset subfigure highlights the maximum loss reduction from $12.99$ to $3.76$.
  • Figure 5: t-SNE visualization of the Alpaca dataset before and after applying Middo. The original dataset is shown in light blue, while the augmented data is in dark blue. The dark blue points tend to occupy the sparsely populated regions of the light blue point distribution.
  • ...and 8 more figures