PlatoLM: Teaching LLMs in Multi-Round Dialogue via a User Simulator

Chuyi Kong; Yaxin Fan; Xiang Wan; Feng Jiang; Benyou Wang

PlatoLM: Teaching LLMs in Multi-Round Dialogue via a User Simulator

Chuyi Kong, Yaxin Fan, Xiang Wan, Feng Jiang, Benyou Wang

TL;DR

PlatoLM tackles the scarcity of authentic multi-turn human-data by replacing static ChatGPT roleplay with a trainable Socratic user simulator. SocraticChat, generated through interactive questioning between Socratic and a strong middle model, provides richer, more human-like prompts to train PlatoLM, achieving state-of-the-art results among 7B-LLaMA-based models on MT-Bench. Ablation studies confirm the benefits of seed-domain data, cross-backbone collaboration, and dynamic training paradigms for multi-turn learning. The work highlights the importance of human-like questioning patterns in teaching LLMs and demonstrates data efficiency, transferability, and ethical considerations, while acknowledging dataset quality and scale limitations.

Abstract

The unparalleled performance of closed-sourced ChatGPT has sparked efforts towards its democratization, with notable strides made by leveraging real user and ChatGPT dialogues, as evidenced by Vicuna. However, due to challenges in gathering dialogues involving human participation, current endeavors like Baize and UltraChat rely on ChatGPT conducting roleplay to simulate humans based on instructions, resulting in overdependence on seeds, diminished human-likeness, limited topic diversity, and an absence of genuine multi-round conversational dynamics. To address the above issues, we propose a paradigm to simulate human behavior better and explore the benefits of incorporating more human-like questions in multi-turn conversations. Specifically, we directly target human questions extracted from genuine human-machine conversations as a learning goal and provide a novel user simulator called `Socratic'. The experimental results show our response model, `PlatoLM', achieves SoTA performance among LLaMA-based 7B models in MT-Bench. Our findings further demonstrate that our method introduces highly human-like questioning patterns and rich topic structures, which can teach the response model better than previous works in multi-round conversations.

PlatoLM: Teaching LLMs in Multi-Round Dialogue via a User Simulator

TL;DR

Abstract

Paper Structure (55 sections, 14 figures, 10 tables)

This paper contains 55 sections, 14 figures, 10 tables.

Introduction
Background
Methodology
The User Simulator - Socratic
Data Preprocessing
Training Protocol
The Conversation Dataset - SocraticChat
Optional Seed Mode
Automatic Termination Mechanism
The System Agent - PlatoLM
Experiments
Baseline Trials
Metrics
Results
Ablation Studies
...and 40 more sections

Figures (14)

Figure 1: Analogy to Socratic Teaching of Methodology
Figure 2: Comparison between Vicuna, UltraLM, and PlatoLM. The commonness of the three models is that they all learn from a user-system conversation data. Note that training Socratic and PlatoLM (also for Vicuna and UltraLM) is symmetrical; the difference is that the former mimics the user and the latter mimics the system.
Figure 3: Vicuna-Bench (GPT-4)
Figure 4: MT-Bench (GPT-4)
Figure 5: Vicuna-Bench (Human)
...and 9 more figures

PlatoLM: Teaching LLMs in Multi-Round Dialogue via a User Simulator

TL;DR

Abstract

PlatoLM: Teaching LLMs in Multi-Round Dialogue via a User Simulator

Authors

TL;DR

Abstract

Table of Contents

Figures (14)