Table of Contents
Fetching ...

Efficient Summarization with Read-Again and Copy Mechanism

Wenyuan Zeng, Wenjie Luo, Sanja Fidler, Raquel Urtasun

TL;DR

The paper tackles encoder and decoder inefficiencies in abstractive summarization by introducing a Read-Again two-pass encoder, which first reads the input to bias the second pass, and a copy mechanism in the decoder that allows generation from a small vocabulary while copying from the source for out-of-vocabulary words. The approach is applicable to GRU and LSTM architectures and extends to multi-sentence inputs with hierarchical representations. Empirically, Read-Again with copy achieves state-of-the-art ROUGE scores on Gigaword and DUC2004, and can maintain strong performance with substantially reduced decoder vocabularies, resulting in faster decoding. The work improves OOV handling and overall efficiency, with potential applicability to other sequence-to-sequence tasks such as machine translation.

Abstract

Encoder-decoder models have been widely used to solve sequence to sequence prediction tasks. However current approaches suffer from two shortcomings. First, the encoders compute a representation of each word taking into account only the history of the words it has read so far, yielding suboptimal representations. Second, current decoders utilize large vocabularies in order to minimize the problem of unknown words, resulting in slow decoding times. In this paper we address both shortcomings. Towards this goal, we first introduce a simple mechanism that first reads the input sequence before committing to a representation of each word. Furthermore, we propose a simple copy mechanism that is able to exploit very small vocabularies and handle out-of-vocabulary words. We demonstrate the effectiveness of our approach on the Gigaword dataset and DUC competition outperforming the state-of-the-art.

Efficient Summarization with Read-Again and Copy Mechanism

TL;DR

The paper tackles encoder and decoder inefficiencies in abstractive summarization by introducing a Read-Again two-pass encoder, which first reads the input to bias the second pass, and a copy mechanism in the decoder that allows generation from a small vocabulary while copying from the source for out-of-vocabulary words. The approach is applicable to GRU and LSTM architectures and extends to multi-sentence inputs with hierarchical representations. Empirically, Read-Again with copy achieves state-of-the-art ROUGE scores on Gigaword and DUC2004, and can maintain strong performance with substantially reduced decoder vocabularies, resulting in faster decoding. The work improves OOV handling and overall efficiency, with potential applicability to other sequence-to-sequence tasks such as machine translation.

Abstract

Encoder-decoder models have been widely used to solve sequence to sequence prediction tasks. However current approaches suffer from two shortcomings. First, the encoders compute a representation of each word taking into account only the history of the words it has read so far, yielding suboptimal representations. Second, current decoders utilize large vocabularies in order to minimize the problem of unknown words, resulting in slow decoding times. In this paper we address both shortcomings. Towards this goal, we first introduce a simple mechanism that first reads the input sequence before committing to a representation of each word. Furthermore, we propose a simple copy mechanism that is able to exploit very small vocabularies and handle out-of-vocabulary words. We demonstrate the effectiveness of our approach on the Gigaword dataset and DUC competition outperforming the state-of-the-art.

Paper Structure

This paper contains 18 sections, 16 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Read-Again Model
  • Figure 2: Read-Again Model
  • Figure 3: Hierachical Read-Again
  • Figure 4: Weight Visualization