Unsupervised Extractive Dialogue Summarization in Hyperdimensional Space
Seongmin Park, Kyungho Kim, Jaejin Seo, Jihwa Lee
TL;DR
HyperSum tackles the need for fast, faithful unsupervised extractive dialogue summarization. It builds sentence embeddings in a high-dimensional space ($D=10{,}000$) using thermometer encoding and position-aware binding, then selects central sentences via $k$-medoids. The results show HyperSum often surpasses state-of-the-art baselines in ROUGE and ExtEval while being orders of magnitude faster on CPU. The work provides a strong new baseline and open-source release for unsupervised extractive dialogue summarization.
Abstract
We present HyperSum, an extractive summarization framework that captures both the efficiency of traditional lexical summarization and the accuracy of contemporary neural approaches. HyperSum exploits the pseudo-orthogonality that emerges when randomly initializing vectors at extremely high dimensions ("blessing of dimensionality") to construct representative and efficient sentence embeddings. Simply clustering the obtained embeddings and extracting their medoids yields competitive summaries. HyperSum often outperforms state-of-the-art summarizers -- in terms of both summary accuracy and faithfulness -- while being 10 to 100 times faster. We open-source HyperSum as a strong baseline for unsupervised extractive summarization.
