Self-Directed Synthetic Dialogues and Revisions Technical Report
Nathan Lambert, Hailey Schoelkopf, Aaron Gokaslan, Luca Soldaini, Valentina Pyatkin, Louis Castricato
TL;DR
The paper addresses the scarcity of open, multi-turn synthetic data for fine-tuning language models by introducing Self-Directed Synthetic Dialogues (SDSD), a procedurally generated, topic-principled dialogue dataset created from open models (DBRX, Llama 2 70B, Mistral Large). SDSD uses an initial plan as a system prompt to guide self-chat between two instances of the same model, and employs a critique–revision loop to produce a SDSD-Revisions (SDSD-R) dataset that encodes preference data. Building on RLHF and Constitutional AI, the authors detail the data-creation workflow, model prompts, and evaluation of dialogue properties (e.g., turn length, principle violations). The work demonstrates the feasibility of generating long-form, open-model synthetic data and discusses practical considerations, limitations, and future directions for employing SDSD/SDSD-R in open-model fine-tuning and alignment research.
Abstract
Synthetic data has become an important tool in the fine-tuning of language models to follow instructions and solve complex problems. Nevertheless, the majority of open data to date is often lacking multi-turn data and collected on closed models, limiting progress on advancing open fine-tuning methods. We introduce Self Directed Synthetic Dialogues (SDSD), an experimental dataset consisting of guided conversations of language models talking to themselves. The dataset consists of multi-turn conversations generated with DBRX, Llama 2 70B, and Mistral Large, all instructed to follow a conversation plan generated prior to the conversation. We also explore including principles from Constitutional AI and other related works to create synthetic preference data via revisions to the final conversation turn. We hope this work encourages further exploration in multi-turn data and the use of open models for expanding the impact of synthetic data.
