The Right Time Matters: Data Arrangement Affects Zero-Shot Generalization in Instruction Tuning
Bingxiang He, Ning Ding, Cheng Qian, Jia Deng, Ganqu Cui, Lifan Yuan, Haiwen Hong, Huan-ang Gao, Longtao Huang, Hui Xue, Huimin Chen, Zhiyuan Liu, Maosong Sun
TL;DR
This work reframes zero-shot generalization in instruction tuning from a task-centered view to a data-centric perspective, showing that generalization arises very early and is best tracked by loss rather than traditional metrics. It analyzes how training data arrangement—through similarity to test data and granularity of instructions—drives rapid or delayed generalization, demonstrating that high-similarity, fine-grained data exposed early fosters stronger unseen-task performance. The authors introduce Test-centric Multi-turn Arrangement (TMA), a framework that organizes training data around test data characteristics to promote continual learning and further loss reduction, with strong empirical gains across multiple datasets. Overall, the study offers a principled data-ahead approach to improving zero-shot generalization in instruction-tuned LLMs and highlights practical considerations for data curation and training strategies.
Abstract
Understanding alignment techniques begins with comprehending zero-shot generalization brought by instruction tuning, but little of the mechanism has been understood. Existing work has largely been confined to the task level, without considering that tasks are artificially defined and, to LLMs, merely consist of tokens and representations. To bridge this gap, we investigate zero-shot generalization from the perspective of the data itself. We first demonstrate that zero-shot generalization happens very early during instruction tuning, with loss serving as a stable indicator. Next, we investigate training data arrangement through similarity and granularity perspectives, confirming that the timing of exposure to certain training examples may greatly facilitate generalization on unseen tasks. Finally, we propose a more grounded training data arrangement framework, Test-centric Multi-turn Arrangement, and show its effectiveness in promoting continual learning and further loss reduction. For the first time, we show that zero-shot generalization during instruction tuning is a form of similarity-based generalization between training and test data at the instance level. Our code is released at https://github.com/thunlp/Dynamics-of-Zero-Shot-Generalization.
