Table of Contents
Fetching ...

RuleR: Improving LLM Controllability by Rule-based Data Recycling

Ming Li, Han Chen, Chenguang Wang, Dang Nguyen, Dianqi Li, Tianyi Zhou

TL;DR

Rule-based Data Recycling (RuleR), a data augmentation method incorporating multiple constraints into the original data samples according to predefined rules, which creates new training tasks to consolidate the controllability of LLMs.

Abstract

Large language models (LLMs) still lack delicate controllability over their responses, which is critical to enhancing their performance and the user experience. However, curating supervised fine-tuning (SFT) datasets to improve LLM controllability usually relies on human experts or proprietary LLMs, which requires additional costs. To bridge this gap, we propose Rule-based Data Recycling (RuleR), a data augmentation method incorporating multiple constraints into the original data samples according to predefined rules, which creates new training tasks to consolidate the controllability of LLMs. Instead of creating new data from scratch, RuleR "recycles" existing data by simply applying rule-based edits to their responses and appending the rule-instructions in their original instructions. Experimental results demonstrate RuleR's effectiveness in improving LLM controllability while maintaining general instruction-following capabilities.

RuleR: Improving LLM Controllability by Rule-based Data Recycling

TL;DR

Rule-based Data Recycling (RuleR), a data augmentation method incorporating multiple constraints into the original data samples according to predefined rules, which creates new training tasks to consolidate the controllability of LLMs.

Abstract

Large language models (LLMs) still lack delicate controllability over their responses, which is critical to enhancing their performance and the user experience. However, curating supervised fine-tuning (SFT) datasets to improve LLM controllability usually relies on human experts or proprietary LLMs, which requires additional costs. To bridge this gap, we propose Rule-based Data Recycling (RuleR), a data augmentation method incorporating multiple constraints into the original data samples according to predefined rules, which creates new training tasks to consolidate the controllability of LLMs. Instead of creating new data from scratch, RuleR "recycles" existing data by simply applying rule-based edits to their responses and appending the rule-instructions in their original instructions. Experimental results demonstrate RuleR's effectiveness in improving LLM controllability while maintaining general instruction-following capabilities.
Paper Structure (19 sections, 4 equations, 4 figures, 4 tables)

This paper contains 19 sections, 4 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Comparing widely-used data generation strategy (top) and RuleR (bottom) enhancing LLM controllability. Most existing methods rely on human/model rewriting to generate new instructions and responses. However, discarding existing data is a waste of effort. Our RuleR demonstrates that simple rule-based (human/model-free) editing of existing data can generate new SFT data that improves LLM controllability.
  • Figure 2: Examples of our data recycling workflows. (a), (b) and (c) select different predefined rules to modify the original data to fulfill constraints on the complexity or format of the response. The differences in new responses are highlighted in red, the example in (a) has already satisfied the appended constraint, thus the response is kept unchanged.
  • Figure 3: Examples with multiple rules selected and implemented. The randomly generated rule-instructions are colored in violet. The upper example is augmented by $2$ different rules (Paragraph Wrapping, and Sentence Wrapping); the bottom example is augmented by $3$ different rules (Sentence Case, Keyword Wrapping, and Sentence Wrapping). The format differences in new responses are highlighted in red.
  • Figure 4: The prompt we used to request GPT4 to evaluate the responses.