Middo: Model-Informed Dynamic Data Optimization for Enhanced LLM Fine-Tuning via Closed-Loop Learning
Zinan Tang, Xin Gao, Qizhi Pei, Zhuoshi Pan, Mengzhang Cai, Jiang Wu, Conghui He, Lijun Wu
TL;DR
This work addresses the dependence of LLM fine-tuning on high-quality data and the inadequacy of static data curation. It proposes Middo, a closed-loop, model-informed dynamic data optimization framework that uses tri-axial signals—loss patterns for complexity, embedding cluster dynamics for diversity, and self-alignment scores for quality—to adapt seed data in tandem with model progress. The methodology integrates complexity refinement, diversity augmentation, and quality improvement into an iterative loop, preserving dataset size while enhancing learning value. Across multiple benchmarks and base models, Middo yields consistent accuracy gains (e.g., $7.15\%$ on Alpaca with LLaMA-3.1-8B) and demonstrates robust data-quality improvements without increasing data volume, signaling a shift toward sustainable data-model co-evolution in LLM training.
Abstract
Supervised Fine-Tuning (SFT) Large Language Models (LLM) fundamentally rely on high-quality training data. While data selection and data synthesis are two common strategies to improve data quality, existing approaches often face limitations in static dataset curation that fail to adapt to evolving model capabilities. In this paper, we introduce Middo, a self-evolving Model-informed dynamic data optimization framework that uses model-aware data selection and context-preserving data refinement. Unlike conventional one-off filtering/synthesis methods, our framework establishes a closed-loop optimization system: (1) A self-referential diagnostic module proactively identifies suboptimal samples through tri-axial model signals - loss patterns (complexity), embedding cluster dynamics (diversity), and self-alignment scores (quality); (2) An adaptive optimization engine then transforms suboptimal samples into pedagogically valuable training points while preserving semantic integrity; (3) This optimization process continuously evolves with model capability through dynamic learning principles. Experiments on multiple benchmarks demonstrate that our Middo consistently enhances the quality of seed data and boosts LLM's performance with improving accuracy by 7.15% on average while maintaining the original dataset scale. This work establishes a new paradigm for sustainable LLM training through dynamic human-AI co-evolution of data and models. Our datasets, models, and code are publicly available at https://github.com/Word2VecT/Middo.
