Table of Contents
Fetching ...

Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models

Xinrong Zhang, Yingfa Chen, Shengding Hu, Xu Han, Zihang Xu, Yuanwei Xu, Weilin Zhao, Maosong Sun, Zhiyuan Liu

TL;DR

The paper defines duplex models that can listen and respond simultaneously, addressing the limitation of traditional turn-based LLM chat systems. It introduces Time-Division-Multiplexing and time-slicing to enable pseudo-simultaneous processing, and builds a large-scale duplex-tuning dataset (Duplex-UltraChat) to train models like MiniCPM-duplex. Through training, automated GPT-4 judgments, and human evaluations, the duplex approach preserves standard benchmarks while significantly improving responsiveness and perceived human-likeness, ultimately enhancing user satisfaction in real-time conversations. The work provides a practical demonstration, releases datasets and a model, and discusses remaining challenges such as data quality, decoding strategies, and TTS smoothing, outlining a path toward more natural human–AI interactions.

Abstract

As large language models (LLMs) increasingly permeate daily lives, there is a growing demand for real-time interactions that mirror human conversations. Traditional turn-based chat systems driven by LLMs prevent users from verbally interacting with the system while it is generating responses. To overcome these limitations, we adapt existing LLMs to \textit{duplex models} so that these LLMs can listen for users while generating output and dynamically adjust themselves to provide users with instant feedback. % such as in response to interruptions. Specifically, we divide the queries and responses of conversations into several time slices and then adopt a time-division-multiplexing (TDM) encoding-decoding strategy to pseudo-simultaneously process these slices. Furthermore, to make LLMs proficient enough to handle real-time conversations, we build a fine-tuning dataset consisting of alternating time slices of queries and responses as well as covering typical feedback types in instantaneous interactions. Our experiments show that although the queries and responses of conversations are segmented into incomplete slices for processing, LLMs can preserve their original performance on standard benchmarks with a few fine-tuning steps on our dataset. Automatic and human evaluation indicate that duplex models make user-AI interactions more natural and human-like, and greatly improve user satisfaction compared to vanilla LLMs. Our duplex model and dataset will be released.

Beyond the Turn-Based Game: Enabling Real-Time Conversations with Duplex Models

TL;DR

The paper defines duplex models that can listen and respond simultaneously, addressing the limitation of traditional turn-based LLM chat systems. It introduces Time-Division-Multiplexing and time-slicing to enable pseudo-simultaneous processing, and builds a large-scale duplex-tuning dataset (Duplex-UltraChat) to train models like MiniCPM-duplex. Through training, automated GPT-4 judgments, and human evaluations, the duplex approach preserves standard benchmarks while significantly improving responsiveness and perceived human-likeness, ultimately enhancing user satisfaction in real-time conversations. The work provides a practical demonstration, releases datasets and a model, and discusses remaining challenges such as data quality, decoding strategies, and TTS smoothing, outlining a path toward more natural human–AI interactions.

Abstract

As large language models (LLMs) increasingly permeate daily lives, there is a growing demand for real-time interactions that mirror human conversations. Traditional turn-based chat systems driven by LLMs prevent users from verbally interacting with the system while it is generating responses. To overcome these limitations, we adapt existing LLMs to \textit{duplex models} so that these LLMs can listen for users while generating output and dynamically adjust themselves to provide users with instant feedback. % such as in response to interruptions. Specifically, we divide the queries and responses of conversations into several time slices and then adopt a time-division-multiplexing (TDM) encoding-decoding strategy to pseudo-simultaneously process these slices. Furthermore, to make LLMs proficient enough to handle real-time conversations, we build a fine-tuning dataset consisting of alternating time slices of queries and responses as well as covering typical feedback types in instantaneous interactions. Our experiments show that although the queries and responses of conversations are segmented into incomplete slices for processing, LLMs can preserve their original performance on standard benchmarks with a few fine-tuning steps on our dataset. Automatic and human evaluation indicate that duplex models make user-AI interactions more natural and human-like, and greatly improve user satisfaction compared to vanilla LLMs. Our duplex model and dataset will be released.
Paper Structure (52 sections, 7 figures, 5 tables)

This paper contains 52 sections, 7 figures, 5 tables.

Figures (7)

  • Figure 1: Illustration of the input/output processing scheme of traditional models (\ref{['fig:traditional_lm']}) and duplex models (\ref{['fig:duplex_lm']}). Traditional models receive the complete input from the user before generating the response. In contrast, duplex models process the input and generate the output in an online manner.
  • Figure 2: Responses of MiniCPM when inputs are time slices.
  • Figure 3: An example of uninterrupted dialogue in Duplex-UltraChat.
  • Figure 4: Some examples from Duplex-UltraChat.
  • Figure 5: The human evaluation score distributions for MiniCPM and MiniCPM-duplex regarding responsiveness, human-likeness, factuality, faithfulness, and overall satisfaction.
  • ...and 2 more figures