Complexity-aware fine-tuning
Andrey Goncharov, Daniil Vyazhev, Petr Sychev, Edvard Khalafyan, Alexey Zaytsev
TL;DR
This work tackles the challenge of efficiently fine-tuning domain-adapted LLMs under resource constraints by introducing a complexity-aware pipeline that uses token-answer entropy to split data into regular and hard categories. Easy data receive standard supervised fine-tuning, while hard data leverage distillation of chain-of-thought from a larger model, enabling targeted reasoning where needed. Across two open 3B-scale models and the MMLU-Pro benchmark, the approach outperforms standard SFT and curriculum baselines and matches distillation performance while using up to $81\%$ less data. The study also conducts extensive sensitivity analyses on alternative complexity metrics and highlights the practical viability of entropy-based complexity signals for data curation and efficient fine-tuning.
Abstract
General-purpose Large Language Models (LLMs) are frequently fine-tuned through supervised fine-tuning (SFT) to enhance performance in specific domains. Better results can be achieved by distilling the chain-of-thought of a larger model at the cost of numerous expensive calls and a much greater amount of data. We propose a novel blueprint for efficient fine-tuning that uses reasoning only for complex data identified by entropy. Specifically, across two small open models ($~3B$) we split the training data into complexity categories by a single token answer entropy (ROC AUC $0.73$), fine-tune large language models (LLMs) via SFT and distillation, and show that our pipeline significantly outperforms the standard SFT approach ($0.58$ vs $0.45$ average accuracy) and outperforms the distillation approach ($0.58$ vs $0.56$ average accuracy) while using $81%$ less data.
