MODS: Moderating a Mixture of Document Speakers to Summarize Debatable Queries in Document Collections
Nishant Balepur, Alexa Siu, Nedim Lipka, Franck Dernoncourt, Tong Sun, Jordan Boyd-Graber, Puneet Mathur
TL;DR
MODS presents Debatable QFS (DQFS), addressing the limitation of traditional query-focused summarization in handling questions with opposing viewpoints. The framework hinges on a panel-like multi-LLM design where documents act as Speaker LLMs and a Moderator LLM orchestrates topic-specific responses, guided by a rich outline that tracks perspectives and stances. The authors introduce DebateQFS and ConflictingQA datasets to evaluate coverage, balance, and faithfulness via pre-hoc citation metrics, showing that MoDS achieves superior topic-paragraph coverage and balanced representation versus strong baselines. Empirical results from both datasets, including human evaluations and ablation studies, demonstrate that content planning through outlines and speaker-controlled interactions substantially improves debatable-query summaries, with practical implications for balanced information synthesis. Limitations include computational cost and the need for broader human validation, while ethical considerations emphasize careful handling of misinformation and user-guided balance in controversial topics.
Abstract
Query-focused summarization (QFS) gives a summary of documents to answer a query. Past QFS work assumes queries have one answer, ignoring debatable ones (Is law school worth it?). We introduce Debatable QFS (DQFS), a task to create summaries that answer debatable queries via documents with opposing perspectives; summaries must comprehensively cover all sources and balance perspectives, favoring no side. These goals elude LLM QFS systems, which: 1) lack structured content plans, failing to guide LLMs to write balanced summaries, and 2) use the same query to retrieve contexts across documents, failing to cover all perspectives specific to each document's content. To overcome this, we design MODS, a multi-LLM framework mirroring human panel discussions. MODS treats documents as individual Speaker LLMs and has a Moderator LLM that picks speakers to respond to tailored queries for planned topics. Speakers use tailored queries to retrieve relevant contexts from their documents and supply perspectives, which are tracked in a rich outline, yielding a content plan to guide the final summary. Experiments on ConflictingQA with controversial web queries and DebateQFS, our new dataset of debate queries from Debatepedia, show MODS beats SOTA by 38-59% in topic paragraph coverage and balance, based on new citation metrics. Users also find MODS's summaries to be readable and more balanced.
