SS-MPC: A Sequence-Structured Multi-Party Conversation System
Yoonjin Jang, Keunha Kim, Youngjoong Ko
TL;DR
SS-MPC tackles the challenge of multi-party conversation response generation without explicit graph encoders by encoding dialogue structure as a sequence of soft prompts embedded into a Transformer encoder–decoder. It introduces MPC structure tokens (Index, Speaker, and Structure masking) and a post-training stage that teaches the encoder to predict masked structure information, enabling end-to-end generation even when some structural data is missing. Empirical results on Ubuntu IRC benchmarks show SS-MPC achieves substantial gains in BLEU-1 and ROUGE-L over state-of-the-art MPC models, with favorable human judgments for fluency, relevance, and informativeness. The work demonstrates practical benefits for real-world MPC systems by allowing simultaneous structure analysis and response generation in an end-to-end framework, while also highlighting limitations related to dataset coverage and generalization to diverse domains.
Abstract
Recent Multi-Party Conversation (MPC) models typically rely on graph-based approaches to capture dialogue structures. However, these methods have limitations, such as information loss during the projection of utterances into structural embeddings and constraints in leveraging pre-trained language models directly. In this paper, we propose \textbf{SS-MPC}, a response generation model for MPC that eliminates the need for explicit graph structures. Unlike existing models that depend on graphs to analyze conversation structures, SS-MPC internally encodes the dialogue structure as a sequential input, enabling direct utilization of pre-trained language models. Experimental results show that \textbf{SS-MPC} achieves \textbf{15.60\% BLEU-1} and \textbf{12.44\% ROUGE-L} score, outperforming the current state-of-the-art MPC response generation model by \textbf{3.91\%p} in \textbf{BLEU-1} and \textbf{0.62\%p} in \textbf{ROUGE-L}. Additionally, human evaluation confirms that SS-MPC generates more fluent and accurate responses compared to existing MPC models.
