Learning to Summarize by Learning to Quiz: Adversarial Agentic Collaboration for Long Document Summarization
Weixuan Wang, Minghao Wu, Barry Haddow, Alexandra Birch
TL;DR
SummQ introduces an adversarial multi-agent framework for long-document summarization that couples summarization and quizzing tasks. It deploys four agent types—Summary Generators, Quiz Generators, Summary Reviewers, and Quiz Reviewers—plus an Examinee to ensure quiz questions can be answered from the summary, enabling iterative refinement across rounds. The approach uses a four-phase generator process (independent drafting, aggregation, best draft selection, collective voting) and a four-phase reviewer process (independent reviewing, issue categorization, contested issue debate, final decision) to produce high-quality, verifiable summaries. Empirical results on MENSA, BookSum, and GovReport show state-of-the-art performance across ROUGE, BERTScore, LLM-as-a-Judge, and human evaluations, with SummQcombo generally outperforming SummQsolo and baselines; analyses reveal how iteration count, agent count, and backbone quality shape performance and cost. The work demonstrates that adversarial agentic collaboration with quiz-based quality checks can significantly improve the quality, coverage, and verifiability of long-document summaries, offering a scalable blueprint for robust abstractive summarization.
Abstract
Long document summarization remains a significant challenge for current large language models (LLMs), as existing approaches commonly struggle with information loss, factual inconsistencies, and coherence issues when processing excessively long documents. We propose SummQ, a novel adversarial multi-agent framework that addresses these limitations through collaborative intelligence between specialized agents operating in two complementary domains: summarization and quizzing. Our approach employs summary generators and reviewers that work collaboratively to create and evaluate comprehensive summaries, while quiz generators and reviewers create comprehension questions that serve as continuous quality checks for the summarization process. This adversarial dynamic, enhanced by an examinee agent that validates whether the generated summary contains the information needed to answer the quiz questions, enables iterative refinement through multifaceted feedback mechanisms. We evaluate SummQ on three widely used long document summarization benchmarks. Experimental results demonstrate that our framework significantly outperforms existing state-of-the-art methods across ROUGE and BERTScore metrics, as well as in LLM-as-a-Judge and human evaluations. Our comprehensive analyses reveal the effectiveness of the multi-agent collaboration dynamics, the influence of different agent configurations, and the impact of the quizzing mechanism. This work establishes a new approach for long document summarization that uses adversarial agentic collaboration to improve summarization quality.
