Table of Contents
Fetching ...

Unsupervised Extractive Dialogue Summarization in Hyperdimensional Space

Seongmin Park, Kyungho Kim, Jaejin Seo, Jihwa Lee

TL;DR

HyperSum tackles the need for fast, faithful unsupervised extractive dialogue summarization. It builds sentence embeddings in a high-dimensional space ($D=10{,}000$) using thermometer encoding and position-aware binding, then selects central sentences via $k$-medoids. The results show HyperSum often surpasses state-of-the-art baselines in ROUGE and ExtEval while being orders of magnitude faster on CPU. The work provides a strong new baseline and open-source release for unsupervised extractive dialogue summarization.

Abstract

We present HyperSum, an extractive summarization framework that captures both the efficiency of traditional lexical summarization and the accuracy of contemporary neural approaches. HyperSum exploits the pseudo-orthogonality that emerges when randomly initializing vectors at extremely high dimensions ("blessing of dimensionality") to construct representative and efficient sentence embeddings. Simply clustering the obtained embeddings and extracting their medoids yields competitive summaries. HyperSum often outperforms state-of-the-art summarizers -- in terms of both summary accuracy and faithfulness -- while being 10 to 100 times faster. We open-source HyperSum as a strong baseline for unsupervised extractive summarization.

Unsupervised Extractive Dialogue Summarization in Hyperdimensional Space

TL;DR

HyperSum tackles the need for fast, faithful unsupervised extractive dialogue summarization. It builds sentence embeddings in a high-dimensional space () using thermometer encoding and position-aware binding, then selects central sentences via -medoids. The results show HyperSum often surpasses state-of-the-art baselines in ROUGE and ExtEval while being orders of magnitude faster on CPU. The work provides a strong new baseline and open-source release for unsupervised extractive dialogue summarization.

Abstract

We present HyperSum, an extractive summarization framework that captures both the efficiency of traditional lexical summarization and the accuracy of contemporary neural approaches. HyperSum exploits the pseudo-orthogonality that emerges when randomly initializing vectors at extremely high dimensions ("blessing of dimensionality") to construct representative and efficient sentence embeddings. Simply clustering the obtained embeddings and extracting their medoids yields competitive summaries. HyperSum often outperforms state-of-the-art summarizers -- in terms of both summary accuracy and faithfulness -- while being 10 to 100 times faster. We open-source HyperSum as a strong baseline for unsupervised extractive summarization.
Paper Structure (14 sections, 2 equations, 1 figure, 7 tables)

This paper contains 14 sections, 2 equations, 1 figure, 7 tables.

Figures (1)

  • Figure 1: HyperSum's utterance embeddings for clip #6 from the Behance dataset, visualized with t-SNE van2008visualizing. Different shapes denote different sentence clusters. Shaded markers in each cluster are medoids, which are selected as its representative summary.