Table of Contents
Fetching ...

Ada-Instruct: Adapting Instruction Generators for Complex Reasoning

Wanyun Cui, Qianle Wang

TL;DR

It is shown that fine-tuning open source LLMs with only ten examples can produce complex instructions that maintain distributional consistency for complex reasoning tasks, and Ada-Instruct, an adaptive instruction generator developed through fine-tuning is introduced.

Abstract

Instructions augmentation is a crucial step for unleashing the full potential of large language models (LLMs) in downstream tasks. Existing Self-Instruct methods primarily simulate new instructions from a few initial instructions with in-context learning. However, our study identifies a critical flaw in this approach: even with GPT4o, Self-Instruct cannot generate complex instructions of length $\ge 100$, which is necessary in complex tasks such as code completion. To address this issue, our key insight is that fine-tuning open source LLMs with only ten examples can produce complex instructions that maintain distributional consistency for complex reasoning tasks. We introduce Ada-Instruct, an adaptive instruction generator developed through fine-tuning. We empirically validated Ada-Instruct's efficacy across different applications. The results highlight Ada-Instruct's capacity to generate long, intricate, and distributionally consistent instructions.

Ada-Instruct: Adapting Instruction Generators for Complex Reasoning

TL;DR

It is shown that fine-tuning open source LLMs with only ten examples can produce complex instructions that maintain distributional consistency for complex reasoning tasks, and Ada-Instruct, an adaptive instruction generator developed through fine-tuning is introduced.

Abstract

Instructions augmentation is a crucial step for unleashing the full potential of large language models (LLMs) in downstream tasks. Existing Self-Instruct methods primarily simulate new instructions from a few initial instructions with in-context learning. However, our study identifies a critical flaw in this approach: even with GPT4o, Self-Instruct cannot generate complex instructions of length , which is necessary in complex tasks such as code completion. To address this issue, our key insight is that fine-tuning open source LLMs with only ten examples can produce complex instructions that maintain distributional consistency for complex reasoning tasks. We introduce Ada-Instruct, an adaptive instruction generator developed through fine-tuning. We empirically validated Ada-Instruct's efficacy across different applications. The results highlight Ada-Instruct's capacity to generate long, intricate, and distributionally consistent instructions.
Paper Structure (31 sections, 2 equations, 6 figures, 8 tables)

This paper contains 31 sections, 2 equations, 6 figures, 8 tables.

Figures (6)

  • Figure 1: Length Distribution of Different Methods. The length is measured by the number of tokens. All methods start with the same 10 instructions. \ref{['fig:length:a0']}\ref{['fig:length:c0']}: Self-Instruct struggles to generate complex instructions with more tokens, even being explicitly asked to do so \ref{['fig:length:a']}\ref{['fig:length:c']}. \ref{['fig:length:gpt4o:humaneval']}\ref{['fig:length:gpt4o:gsm8k']}: The more advanced GPT-4o still has this issue. \ref{['fig:length:b']}\ref{['fig:length:d']}: Ada-Instruct successfully produces instructions whose length is consistently aligned with the target distribution.
  • Figure 2: How Ada-Instruct works. We fine-tune LLMs as instruction generators from few-shot initial samples (step 1), while previous self-instruct methods use in-context prompting and closed-source LLMs. We then use ChatGPT to generate labels (step 2), and fine-tune a task-specific model with the labeled samples (step 3).
  • Figure 3: Semantic distribution of generated instructions by t-SNE. Ada-Instruct shows better semantic distribution consistency than Evol-Instruct.
  • Figure 4: Similarity score distribution. Ada-Instruct generally has lower similarity scores than Self-Instruct, indicating that it has high diversity.
  • Figure 5: All generated instructions (noisy) vs correct instructions only on MBPP. The correctness is verified by test cases generated from gpt-3.5-turbo-instruct. Using noisy instructions does not cause a significant performance decline.
  • ...and 1 more figures