Table of Contents
Fetching ...

Inductive-Deductive Strategy Reuse for Multi-Turn Instructional Dialogues

Jiao Ou, Jiayu Wu, Che Liu, Fuzheng Zhang, Di Zhang, Kun Gai

TL;DR

This paper takes inspiration from the cognitive abilities inherent in human learning and proposes the explicit modeling of complex dialogue flows through instructional strategy reuse, which can generate diverse, in-depth, and insightful instructions for a given dialogue history.

Abstract

Aligning large language models (LLMs) with human expectations requires high-quality instructional dialogues, which usually require instructions that are diverse and in-depth. Existing methods leverage two LLMs to interact for automatic collection: one simulating a user to pose instructions, and the other acting as a system agent to respond. However, these user simulators struggle to model the rules behind how dialogues can pose different instructions without explicit guidance, resulting in general instructions. In this paper, we propose to explicitly capture the complex rules to help the user simulator pose diverse and in-depth instruction. Specifically, we first induce high-level instruction strategies from various real instruction dialogues serving as rules. Afterward, different possible strategies are applied to the newly given dialogue scenario deductively to pose various instructions. Experimental results show that our method can generate diverse and in-depth instructions. The constructed multi-turn instructional dialogues can outperform competitive baselines on the downstream chat model.

Inductive-Deductive Strategy Reuse for Multi-Turn Instructional Dialogues

TL;DR

This paper takes inspiration from the cognitive abilities inherent in human learning and proposes the explicit modeling of complex dialogue flows through instructional strategy reuse, which can generate diverse, in-depth, and insightful instructions for a given dialogue history.

Abstract

Aligning large language models (LLMs) with human expectations requires high-quality instructional dialogues, which usually require instructions that are diverse and in-depth. Existing methods leverage two LLMs to interact for automatic collection: one simulating a user to pose instructions, and the other acting as a system agent to respond. However, these user simulators struggle to model the rules behind how dialogues can pose different instructions without explicit guidance, resulting in general instructions. In this paper, we propose to explicitly capture the complex rules to help the user simulator pose diverse and in-depth instruction. Specifically, we first induce high-level instruction strategies from various real instruction dialogues serving as rules. Afterward, different possible strategies are applied to the newly given dialogue scenario deductively to pose various instructions. Experimental results show that our method can generate diverse and in-depth instructions. The constructed multi-turn instructional dialogues can outperform competitive baselines on the downstream chat model.
Paper Structure (50 sections, 9 equations, 3 figures, 15 tables)

This paper contains 50 sections, 9 equations, 3 figures, 15 tables.

Figures (3)

  • Figure 1: An example of humans generating instructions by deductively utilizing instruction strategies, derived from inductive analysis of instructional dialogues.
  • Figure 2: The overall architecture of building multi-turn instructional dialogues. In the induction stage, IDEAS induces high-level strategies $\mathcal{F}$ from human-machine instructional dialogues $\mathcal{D}_{ins}$. In the deduction stage, the user simulator iteratively interacts with a system agent to produce new multi-turn dialogues based on the given opening line $\{\boldsymbol{q}'_0, \boldsymbol{a}'_0\}$, as shown on the left side. For generating the current instruction $\boldsymbol{q}'_t$ , the user simulator first chooses an appropriate strategy $\boldsymbol{f}_t$ from the candidate $Q(\boldsymbol{a}'_{t-1})$ based on the dialogue history $\mathbf{h}'_t$, and then generate $\boldsymbol{q}'_t$. If the quality does not meet the requirement, $\boldsymbol{q}'_t$ is regenerated. This process is shown on the right side.
  • Figure 3: Performance changes on chat models respectively by providing different amounts of instructional dialogues generated by IDEAS and Parrot-Ask.