Parrot: Enhancing Multi-Turn Instruction Following for Large Language Models
Yuchong Sun, Che Liu, Kun Zhou, Jinwen Huang, Ruihua Song, Wayne Xin Zhao, Fuzheng Zhang, Di Zhang, Kun Gai
TL;DR
Parrot tackles the underexplored area of multi-turn instruction following in LLMs by (a) automatically collecting human-like multi-turn instructions through Parrot-Ask, (b) introducing Context-aware Preference Optimization (CaPO) to train models to better leverage context, and (c) establishing MT-Bench++ to evaluate long-turn capabilities. The authors construct Parrot-40K, a long-turn, context-rich dataset including 30K negative examples, enhancing supervision beyond prior datasets. Empirical results show that Parrot-Chat with CaPO achieves state-of-the-art performance among 13B open-source models on MT-Bench and MT-Bench++, with notable improvements on later turns. The work provides open-source data and methods, enabling broader study and development of robust multi-turn instruction-following LLMs, while acknowledging limitations in benchmark size and data sources and outlining safety considerations.
Abstract
Humans often interact with large language models (LLMs) in multi-turn interaction to obtain desired answers or more information. However, most existing studies overlook the multi-turn instruction following ability of LLMs, in terms of training dataset, training method, and evaluation benchmark. In this paper, we introduce Parrot, a solution aiming to enhance multi-turn instruction following for LLMs. First, we introduce an efficient but effective method for collecting multi-turn instructions that feature human-like queries, such as anaphora and ellipsis. Second, we propose a context-aware preference optimization strategy to further enhance LLMs for complex queries in multi-turn interaction. Moreover, to quantitatively evaluate LLMs in multi-turn instruction following, we manually build a multi-turn benchmark derived from existing ones. Extensive experiments show that Parrot improves current LLMs by up to 7.2% in multi-turn instruction following. Our dataset and codes will be open-sourced to facilitate future research.
