Better Alignment with Instruction Back-and-Forth Translation

Thao Nguyen; Jeffrey Li; Sewoong Oh; Ludwig Schmidt; Jason Weston; Luke Zettlemoyer; Xian Li

Better Alignment with Instruction Back-and-Forth Translation

Thao Nguyen, Jeffrey Li, Sewoong Oh, Ludwig Schmidt, Jason Weston, Luke Zettlemoyer, Xian Li

TL;DR

This work tackles aligning large language models with instruction-following behavior grounded in world knowledge by constructing high-quality synthetic data via instruction back-and-forth translation. The pipeline first backtranslates web-text into instructions, then rewrites the corresponding responses with an aligned LLM, producing (instruction, rewritten response) pairs from open web sources like Dolma. Fine-tuning Llama-2-7B and Llama-2-70B on these pairs yields significant gains on AlpacaEval versus conventional baselines, with rewriting proving more effective than simple filtering or distillation. Analyses show the rewritten data occupy distinct embedding spaces from distilled outputs and maintain higher response complexity, while instruction backtranslation preserves web-derived information diversity. The approach offers a scalable path to higher-quality instruction data for alignment, balancing information richness from the web with the quality guarantees of model-generated annotations, and suggesting practical impact for safer, more capable LLM alignment in real-world applications.

Abstract

We propose a new method, instruction back-and-forth translation, to construct high-quality synthetic data grounded in world knowledge for aligning large language models (LLMs). Given documents from a web corpus, we generate and curate synthetic instructions using the backtranslation approach proposed by Li et al.(2023a), and rewrite the responses to improve their quality further based on the initial documents. Fine-tuning with the resulting (backtranslated instruction, rewritten response) pairs yields higher win rates on AlpacaEval than using other common instruction datasets such as Humpback, ShareGPT, Open Orca, Alpaca-GPT4 and Self-instruct. We also demonstrate that rewriting the responses with an LLM outperforms direct distillation, and the two generated text distributions exhibit significant distinction in embedding space. Further analysis shows that our backtranslated instructions are of higher quality than other sources of synthetic instructions, while our responses are more diverse and complex than those obtained from distillation. Overall we find that instruction back-and-forth translation combines the best of both worlds -- making use of the information diversity and quantity found on the web, while ensuring the quality of the responses which is necessary for effective alignment.

Better Alignment with Instruction Back-and-Forth Translation

TL;DR

Abstract

Paper Structure (46 sections, 4 figures, 8 tables)

This paper contains 46 sections, 4 figures, 8 tables.

Introduction
Method
Background: instruction backtranslation
The rewriting process
Experiment setup
Training details
Data.
Model.
Evaluation.
Baselines
Fine-tuning results
7B scale
70B scale
Understanding rewritten data quality
Rewriting versus Distilling
...and 31 more sections

Figures (4)

Figure 1: Overview of our proposed pipeline: instruction back-and-forth translation. (1) We first fine-tune a base model, e.g. Llama-2, on some seed examples from Open Assistant, for the task of instruction generation. We then extract initial candidate responses from a web corpus, e.g. Dolma, and use the fine-tuned model to obtain synthetic instructions that would go with the corresponding responses; (2) We separately obtain an instruction-following model by fine-tuning the same base model on the seed examples, and use it to score the quality of the (synthetic instruction, web-scraped response) pairs; (3) With the highest scoring pairs, we ask an existing aligned model (e.g. Llama-2-chat) to improve the responses further, conditioned on the generated instructions and the initial web texts. Steps (1) and (2) follow li2023self with some modifications (i.e. using preprocessed documents from Dolma instead of parsing raw HTMLs from ClueWeb). We provide a specific data example in the bottom row.
Figure 2: Quality of instruction prompts from various datasets, measured by their ability to distill useful information from a fixed model. We randomly sample 12.8K instructions from each dataset in our experiments and input them to Llama-2-70B-chat to distill its knowledge. The quality of the instructions is then measured by the performance of a model (Llama-2-7B) fine-tuned on the (instruction, distilled response) pairs. We find that backtranslated instructions surpass other synthetic instruction generation methods (e.g. Alpaca), while still underperforming human-written queries (e.g. ShareGPT). This gap is partly, but not entirely, due to ShareGPT having longer instructions.
Figure 3: Diversity of instructions and responses from our backtranslation data and other common instruction datasets, as measured by the number of unique trigrams. We find that (i) while instructions generated in this work are more diverse than synthetic instructions from existing baselines including ClueWeb and Alpaca-GPT4, our instruction diversity still lags behind that of manually-crafted prompts, (ii) in terms of responses, using web-crawled texts sourced from Dolma yields the most diverse responses; rewriting reduces the response diversity slightly, but still offers much more text diversity than distillation.
Figure 7: Performance of fine-tuning on responses rewritten by different models. We find that using a smaller model (i.e. Llama-2-chat-7B) or a less aligned model (i.e. Llama-2 fine-tuned on Open Assistant) for rewriting yields lower response quality, as reflected in the win rate of a Llama-2-7B model fine-tuned on the resulting response data.

Better Alignment with Instruction Back-and-Forth Translation

TL;DR

Abstract

Better Alignment with Instruction Back-and-Forth Translation

Authors

TL;DR

Abstract

Table of Contents

Figures (4)