Table of Contents
Fetching ...

ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs

Wenpeng Yin, Hinrich Schütze, Bing Xiang, Bowen Zhou

TL;DR

ABCNN introduces attention-based enhancements to a Siamese CNN framework to model interdependencies between sentence pairs. By constructing attention at multiple levels (word and phrase) and integrating it into convolution and pooling, ABCNN achieves state-of-the-art or competitive performance across answer selection, paraphrase identification, and textual entailment. The results demonstrate that inter-sentence attention and multi-granularity representations yield more powerful pairwise features, with ABCNN-3 often delivering the best results especially when combined with linguistic features. The work suggests that attention mechanisms in CNNs are a viable and effective alternative to recurrent architectures for sentence-pair tasks, particularly with sufficient training data.

Abstract

How to model a pair of sentences is a critical issue in many NLP tasks such as answer selection (AS), paraphrase identification (PI) and textual entailment (TE). Most prior work (i) deals with one individual task by fine-tuning a specific system; (ii) models each sentence's representation separately, rarely considering the impact of the other sentence; or (iii) relies fully on manually designed, task-specific linguistic features. This work presents a general Attention Based Convolutional Neural Network (ABCNN) for modeling a pair of sentences. We make three contributions. (i) ABCNN can be applied to a wide variety of tasks that require modeling of sentence pairs. (ii) We propose three attention schemes that integrate mutual influence between sentences into CNN; thus, the representation of each sentence takes into consideration its counterpart. These interdependent sentence pair representations are more powerful than isolated sentence representations. (iii) ABCNN achieves state-of-the-art performance on AS, PI and TE tasks.

ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs

TL;DR

ABCNN introduces attention-based enhancements to a Siamese CNN framework to model interdependencies between sentence pairs. By constructing attention at multiple levels (word and phrase) and integrating it into convolution and pooling, ABCNN achieves state-of-the-art or competitive performance across answer selection, paraphrase identification, and textual entailment. The results demonstrate that inter-sentence attention and multi-granularity representations yield more powerful pairwise features, with ABCNN-3 often delivering the best results especially when combined with linguistic features. The work suggests that attention mechanisms in CNNs are a viable and effective alternative to recurrent architectures for sentence-pair tasks, particularly with sufficient training data.

Abstract

How to model a pair of sentences is a critical issue in many NLP tasks such as answer selection (AS), paraphrase identification (PI) and textual entailment (TE). Most prior work (i) deals with one individual task by fine-tuning a specific system; (ii) models each sentence's representation separately, rarely considering the impact of the other sentence; or (iii) relies fully on manually designed, task-specific linguistic features. This work presents a general Attention Based Convolutional Neural Network (ABCNN) for modeling a pair of sentences. We make three contributions. (i) ABCNN can be applied to a wide variety of tasks that require modeling of sentence pairs. (ii) We propose three attention schemes that integrate mutual influence between sentences into CNN; thus, the representation of each sentence takes into consideration its counterpart. These interdependent sentence pair representations are more powerful than isolated sentence representations. (iii) ABCNN achieves state-of-the-art performance on AS, PI and TE tasks.

Paper Structure

This paper contains 10 sections, 4 equations, 4 figures, 6 tables.

Figures (4)

  • Figure 1: Positive ($<\!s_0,s_1^+\!>$) and negative ($<\!s_0,s_1^-\!>$) examples for AS, PI and TE tasks. RH = Random House
  • Figure 2: BCNN: ABCNN without Attention
  • Figure 3: Three ABCNN architectures
  • Figure 4: Attention visualization for TE. Top: unigrams, $b_1$. Middle: conv1, $b_2$. Bottom: conv2, $b_3$.