Multi-View Sequence-to-Sequence Models with Conversational Structure for Abstractive Dialogue Summarization
Jiaao Chen, Diyi Yang
TL;DR
This work tackles abstractive dialogue summarization by introducing a multi-view sequence-to-sequence model that leverages diverse conversational structures. It automatically extracts four views (topic, stage, global, discrete) and uses a multi-view attention mechanism on a BART-based encoder-decoder to generate summaries. On the SAMSum dataset, the model with topic and stage views consistently outperforms baselines in ROUGE scores and is favorably rated by human evaluators, with analysis highlighting the advantages of combining views and outlining remaining challenges such as missing information and incorrect reasoning. The approach provides a practical, scalable framework for structured dialogue understanding and summarization, with publicly available code for reproducibility and further research.
Abstract
Text summarization is one of the most challenging and interesting problems in NLP. Although much attention has been paid to summarizing structured text like news reports or encyclopedia articles, summarizing conversations---an essential part of human-human/machine interaction where most important pieces of information are scattered across various utterances of different speakers---remains relatively under-investigated. This work proposes a multi-view sequence-to-sequence model by first extracting conversational structures of unstructured daily chats from different views to represent conversations and then utilizing a multi-view decoder to incorporate different views to generate dialogue summaries. Experiments on a large-scale dialogue summarization corpus demonstrated that our methods significantly outperformed previous state-of-the-art models via both automatic evaluations and human judgment. We also discussed specific challenges that current approaches faced with this task. We have publicly released our code at https://github.com/GT-SALT/Multi-View-Seq2Seq.
