Disentangling Specificity for Abstractive Multi-document Summarization
Congbo Ma, Wei Emma Zhang, Hu Wang, Haojie Zhuang, Mingyu Guo
TL;DR
DisentangleSum tackles the gap in multi-document summarization by explicitly modeling document-specific content alongside the document-set representation. It learns per-document specificity with a shared-parameter encoder and enforces orthogonality across document-specific vectors, then combines these with a document-set encoding to generate abstractive summaries. Empirical results on Multi-News and Multi-XScience show improved coverage and ROUGE scores, with human evaluators favoring DisentangleSum on specificity, comprehensiveness, coherence, and relevance; CPL-based training further enhances robustness and scalability. The approach demonstrates that focusing on document-specific information, rather than solely shared content, yields more comprehensive and informative summaries, with practical impact for real-world MDS systems and potential extensions to inter-document similarity analysis.
Abstract
Multi-document summarization (MDS) generates a summary from a document set. Each document in a set describes topic-relevant concepts, while per document also has its unique contents. However, the document specificity receives little attention from existing MDS approaches. Neglecting specific information for each document limits the comprehensiveness of the generated summaries. To solve this problem, in this paper, we propose to disentangle the specific content from documents in one document set. The document-specific representations, which are encouraged to be distant from each other via a proposed orthogonal constraint, are learned by the specific representation learner. We provide extensive analysis and have interesting findings that specific information and document set representations contribute distinctive strengths and their combination yields a more comprehensive solution for the MDS. Also, we find that the common (i.e. shared) information could not contribute much to the overall performance under the MDS settings. Implemetation codes are available at https://github.com/congboma/DisentangleSum.
