Exploring Format Consistency for Instruction Tuning

Shihao Liang; Runchu Tian; Kunlun Zhu; Yujia Qin; Huadong Wang; Xin Cong; Zhiyuan Liu; Xiaojiang Liu; Maosong Sun

Exploring Format Consistency for Instruction Tuning

Shihao Liang, Runchu Tian, Kunlun Zhu, Yujia Qin, Huadong Wang, Xin Cong, Zhiyuan Liu, Xiaojiang Liu, Maosong Sun

TL;DR

This work addresses format inconsistency across instruction-tuning datasets and proposes Unified Instruction Tuning (UIT), a framework that uses GPT3.5 in-context learning to transfer diverse formats into a unified target format. It further introduces a perplexity-based denoising strategy and an offline GPT-J distillation model to reduce reliance on API calls while preserving transfer quality. Across multiple benchmarks, UIT improves generalization to unseen instructions and outperforms baselines, demonstrating the importance of format consistency in instruction tuning. The paper offers practical, cost-aware techniques for scalable instruction-following models and provides insights into how target formats, model size, and task diversity interact to affect transfer performance.

Abstract

Instruction tuning has emerged as a promising approach to enhancing large language models in following human instructions. It is shown that increasing the diversity and number of instructions in the training data can consistently enhance generalization performance, which facilitates a recent endeavor to collect various instructions and integrate existing instruction tuning datasets into larger collections. However, different users have their unique ways of expressing instructions, and there often exist variations across different datasets in the instruction styles and formats, i.e., format inconsistency. In this work, we propose a framework named Unified Instruction Tuning (UIT), which calls OpenAI APIs for automatic format transfer among different instruction tuning datasets such as PromptSource, FLAN and CrossFit. With the framework, we (1) demonstrate the necessity of maintaining format consistency in instruction tuning; (2) improve the generalization performance on unseen instructions on T5-LM-xl; (3) provide a novel perplexity-based denoising method to reduce the noise of automatic format transfer to make the UIT framework more practical and a smaller offline model based on GPT-J that achieves comparable format transfer capability to OpenAI APIs to reduce costs in practice. Further analysis regarding variations of targeted formats and other effects is intended.

Exploring Format Consistency for Instruction Tuning

TL;DR

Abstract

Paper Structure (31 sections, 1 equation, 6 figures, 7 tables)

This paper contains 31 sections, 1 equation, 6 figures, 7 tables.

Introduction
Related Work
Instruction Tuning
Data Augmentation
Synthetic Data Denoising
Instruction Format Inconsistency
Framework and Experiments
Unified Instruction Format Transfer
Experiments
Settings
Datasets
Baselines
Results and Analyses
Limitations in Practice
Denoising for Format Transfer
...and 16 more sections

Figures (6)

Figure 1: The proposed format transfer framework is applied to two settings: testing-time transfer and training-time transfer. $s_{1},\cdots, s_{N}$ denote the training data in the original instruction format, $t_{1}, \cdots, t_{N}$ denote all the transferred training data in target format.
Figure 2: Transferring instruction formats with UIT. The existing instruction formats exhibit variations across different datasets, which can be classified into three distinct hierarchical formats: Task level, Instance level, and Keywords level. UIT leverages seed parallel data to conduct format transfer across different formats automatically.
Figure 3: An example of format transfer using GPT3.5, where we prompt the model with $3$ parallel examples to generate the target instruction for the 4-th example.
Figure 4: The performance of the denoising strategy at the testing and training time with different number of samples. Detailed results in the form of a table are presented in Section \ref{['app:B']} of the appendices.
Figure 5: Results of T5-LM of different model sizes on the testing-time transfer setting.
...and 1 more figures

Exploring Format Consistency for Instruction Tuning

TL;DR

Abstract

Exploring Format Consistency for Instruction Tuning

Authors

TL;DR

Abstract

Table of Contents

Figures (6)