Table of Contents
Fetching ...

Advancing Multi-Party Dialogue Framework with Speaker-ware Contrastive Learning

Zhongtian Hu, Qi He, Ronghan Li, Meng Zhao, Lifang Wang

TL;DR

This paper tackles the challenge of multi-party dialogue generation, where traditional graph-based methods rely on annotations and struggle to model consistent speaker styles. It introduces CMR, a two-stage contrastive learning framework that first learns speaker-discriminative utterance representations (Stage I) and then jointly optimizes response generation with a contrastive objective to capture discourse themes (Stage II). Through experiments on Friends and Ubuntu with T5 and LLaMA-3.1 backbones, CMR achieves strong automatic and human evaluations, and the LLM-based preference study shows a clear edge for CMR-equipped models. The approach eliminates the need for annotated graphs, demonstrates robustness to varying speaker counts, and integrates well with large pre-trained models, offering a scalable path for improving multi-party dialogue systems.

Abstract

Multi-party dialogues, common in collaborative scenarios like brainstorming sessions and negotiations, pose significant challenges due to their complexity and diverse speaker roles. Current methods often use graph neural networks to model dialogue context, capturing structural dynamics but heavily relying on annotated graph structures and overlooking individual speaking styles. To address these challenges, we propose CMR, a Contrastive learning-based Multi-party dialogue Response generation framework. CMR employs a two-stage self-supervised contrastive learning framework. First, it captures global differences in speaking styles across individuals. Then, it focuses on intra-conversation comparisons to identify thematic transitions and contextually relevant facts. To the best of our knowledge, this is the first approach that applies contrastive learning in multi-party dialogue generation. Experimental results demonstrate that CMR not only significantly outperforms state-of-the-art models, but also generalizes well to large pre-trained language models, effectively enhancing their capability in handling multi-party conversations.

Advancing Multi-Party Dialogue Framework with Speaker-ware Contrastive Learning

TL;DR

This paper tackles the challenge of multi-party dialogue generation, where traditional graph-based methods rely on annotations and struggle to model consistent speaker styles. It introduces CMR, a two-stage contrastive learning framework that first learns speaker-discriminative utterance representations (Stage I) and then jointly optimizes response generation with a contrastive objective to capture discourse themes (Stage II). Through experiments on Friends and Ubuntu with T5 and LLaMA-3.1 backbones, CMR achieves strong automatic and human evaluations, and the LLM-based preference study shows a clear edge for CMR-equipped models. The approach eliminates the need for annotated graphs, demonstrates robustness to varying speaker counts, and integrates well with large pre-trained models, offering a scalable path for improving multi-party dialogue systems.

Abstract

Multi-party dialogues, common in collaborative scenarios like brainstorming sessions and negotiations, pose significant challenges due to their complexity and diverse speaker roles. Current methods often use graph neural networks to model dialogue context, capturing structural dynamics but heavily relying on annotated graph structures and overlooking individual speaking styles. To address these challenges, we propose CMR, a Contrastive learning-based Multi-party dialogue Response generation framework. CMR employs a two-stage self-supervised contrastive learning framework. First, it captures global differences in speaking styles across individuals. Then, it focuses on intra-conversation comparisons to identify thematic transitions and contextually relevant facts. To the best of our knowledge, this is the first approach that applies contrastive learning in multi-party dialogue generation. Experimental results demonstrate that CMR not only significantly outperforms state-of-the-art models, but also generalizes well to large pre-trained language models, effectively enhancing their capability in handling multi-party conversations.
Paper Structure (49 sections, 4 equations, 4 figures, 10 tables, 1 algorithm)

This paper contains 49 sections, 4 equations, 4 figures, 10 tables, 1 algorithm.

Figures (4)

  • Figure 1: This figure illustrates a multi-party dialogue involving four participants: John, Mike, Sarah, and Emma. Each participant engages in discussions on different topics, demonstrating the dynamics and complexity of multi-party conversations.
  • Figure 2: The figure displays the training processes of the CMR framework. Here, an encoder-decoder architecture is presented as an example. Stage I (a) shows the encoder training with contrastive learning. Stage II (b) illustrates the joint training with response generation and contrastive learning.
  • Figure 3: LLM-based pairwise preference comparison on the Friends dataset.
  • Figure 4: Attention weights of the $<$/s$>$ token in the encoder for (a) the CMR model with Stage I contrastive learning, and (b) the CMR model without Stage I contrastive learning. The attention weights are more focused on contextually relevant tokens such as names and pronouns in (a), indicating improved context understanding after Stage I.