Instructive Dialogue Summarization with Query Aggregations
Bin Wang, Zhengyuan Liu, Nancy F. Chen
TL;DR
The paper tackles the problem of tailoring dialogue summaries to user interests, which traditional methods overlook. It introduces InstructDS, an instruction-tuning framework for dialogues, and a three-step pipeline to synthesize query-dialogue-summary triples (QDS) by generating candidate queries, filtering for quality and diversity, and producing query-based summaries; this is trained across three dialogue datasets to form a unified model. Empirically, InstructDS achieves state-of-the-art results on SAMSum and strong performance on DialogSum and TODSum, while transferring effectively to the DREAM reading-comprehension task, with human evaluations indicating competitive fluency, informativeness, and conciseness, and strong faithfulness. The approach demonstrates how synthesized QDS data and length-aware instruction tuning can yield flexible, faithful, and concise summaries that adapt to user queries and potentially scale to long dialogues and privacy-aware settings in the future.
Abstract
Conventional dialogue summarization methods directly generate summaries and do not consider user's specific interests. This poses challenges in cases where the users are more focused on particular topics or aspects. With the advancement of instruction-finetuned language models, we introduce instruction-tuning to dialogues to expand the capability set of dialogue summarization models. To overcome the scarcity of instructive dialogue summarization data, we propose a three-step approach to synthesize high-quality query-based summarization triples. This process involves summary-anchored query generation, query filtering, and query-based summary generation. By training a unified model called InstructDS (Instructive Dialogue Summarization) on three summarization datasets with multi-purpose instructive triples, we expand the capability of dialogue summarization models. We evaluate our method on four datasets, including dialogue summarization and dialogue reading comprehension. Experimental results show that our approach outperforms the state-of-the-art models and even models with larger sizes. Additionally, our model exhibits higher generalizability and faithfulness, as confirmed by human subjective evaluations.
