Enhancing and Assessing Instruction-Following with Fine-Grained Instruction Variants

Jiuding Yang; Weidong Guo; Kaitong Yang; Xiangyang Li; Yu Xu; Di Niu

Enhancing and Assessing Instruction-Following with Fine-Grained Instruction Variants

Jiuding Yang, Weidong Guo, Kaitong Yang, Xiangyang Li, Yu Xu, Di Niu

TL;DR

DeMoRecon, a data augmentation framework that decomposes complex instructions into sub-components, modifies individual elements, and reconstructs them into instruction variants, is introduced, which preserves contextual integrity while injecting targeted variability essential for fine-grained instruction-following.

Abstract

The effective alignment of Large Language Models (LLMs) with precise instructions is essential for their application in diverse real-world scenarios. Current methods focus on enhancing the diversity and complexity of training and evaluation samples, yet they fall short in accurately assessing LLMs' ability to follow similar instruction variants. We introduce an effective data augmentation technique DeMoRecon that decomposes complex instructions into simpler sub-components, modifies these, and reconstructs them into new variants, thereby preserves the original instruction's context and complexity while introducing variability, which is critical for training and evaluating LLMs' instruction-following precision. Based on DeMoRecon, we developed the FGIV dataset which contains fine-grained instruction variants of 1,773 seed instructions to both fine-tune and evaluate LLMs. Our findings show that LLMs fine-tuned with FGIV will gain significant performance boost on both ours and commonly used instructions-following benchmarks.

Enhancing and Assessing Instruction-Following with Fine-Grained Instruction Variants

TL;DR

Abstract

Paper Structure (32 sections, 3 equations, 3 figures, 7 tables)

This paper contains 32 sections, 3 equations, 3 figures, 7 tables.

Introduction
Approach
Seed Preparation
Instruction Augmentation
Response Collection
Statistic and Analysis
Experiments
Baselines
Benchmarks
Experimental Settings
Experimental Results
Ablation Study
Related Work
Alignment of Large Language Models
Instruction-Following
...and 17 more sections

Figures (3)

Figure 1: An illustration of the proposed method of constructing FGIV.
Figure 2: The tSNE plots illustrate the semantic embeddings generated by FGIV.
Figure 3: An real example from FGIV-Eval. The base model is LLaMA-2-7B-Chat. We show the prediction results of the original model and its DPO tuned version using FGIV-R. Response in red box indicates that GPT-4 judged the response as failing to follow the instruction, while a green box signifies success.

Enhancing and Assessing Instruction-Following with Fine-Grained Instruction Variants

TL;DR

Abstract

Enhancing and Assessing Instruction-Following with Fine-Grained Instruction Variants

Authors

TL;DR

Abstract

Table of Contents

Figures (3)