Table of Contents
Fetching ...

Phase Conductor on Multi-layered Attentions for Machine Comprehension

Rui Liu, Wei Wei, Weiguang Mao, Maria Chikina

TL;DR

This work introduces PhaseCond, a two-phase, multi-layered attention framework for machine comprehension that couples a question-aware passage representation phase with an evidence-propagation self-attention phase. It advances attention modeling by employing two encoder schemes (independent and weight-sharing) to generate multiple representations for different parts of the attention mechanism and by using outer and inner fusion layers to regulate information flow. Empirical results on SQuAD show PhaseCond outperforming state-of-the-art single- and multi-layered models, with qualitative analyses revealing how attention patterns evolve across layers. The findings offer practical improvements for QA systems and provide deeper understanding of attention dynamics in layered architectures.

Abstract

Attention models have been intensively studied to improve NLP tasks such as machine comprehension via both question-aware passage attention model and self-matching attention model. Our research proposes phase conductor (PhaseCond) for attention models in two meaningful ways. First, PhaseCond, an architecture of multi-layered attention models, consists of multiple phases each implementing a stack of attention layers producing passage representations and a stack of inner or outer fusion layers regulating the information flow. Second, we extend and improve the dot-product attention function for PhaseCond by simultaneously encoding multiple question and passage embedding layers from different perspectives. We demonstrate the effectiveness of our proposed model PhaseCond on the SQuAD dataset, showing that our model significantly outperforms both state-of-the-art single-layered and multiple-layered attention models. We deepen our results with new findings via both detailed qualitative analysis and visualized examples showing the dynamic changes through multi-layered attention models.

Phase Conductor on Multi-layered Attentions for Machine Comprehension

TL;DR

This work introduces PhaseCond, a two-phase, multi-layered attention framework for machine comprehension that couples a question-aware passage representation phase with an evidence-propagation self-attention phase. It advances attention modeling by employing two encoder schemes (independent and weight-sharing) to generate multiple representations for different parts of the attention mechanism and by using outer and inner fusion layers to regulate information flow. Empirical results on SQuAD show PhaseCond outperforming state-of-the-art single- and multi-layered models, with qualitative analyses revealing how attention patterns evolve across layers. The findings offer practical improvements for QA systems and provide deeper understanding of attention dynamics in layered architectures.

Abstract

Attention models have been intensively studied to improve NLP tasks such as machine comprehension via both question-aware passage attention model and self-matching attention model. Our research proposes phase conductor (PhaseCond) for attention models in two meaningful ways. First, PhaseCond, an architecture of multi-layered attention models, consists of multiple phases each implementing a stack of attention layers producing passage representations and a stack of inner or outer fusion layers regulating the information flow. Second, we extend and improve the dot-product attention function for PhaseCond by simultaneously encoding multiple question and passage embedding layers from different perspectives. We demonstrate the effectiveness of our proposed model PhaseCond on the SQuAD dataset, showing that our model significantly outperforms both state-of-the-art single-layered and multiple-layered attention models. We deepen our results with new findings via both detailed qualitative analysis and visualized examples showing the dynamic changes through multi-layered attention models.

Paper Structure

This paper contains 13 sections, 8 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: PhaseCond: our proposed attention model structure overview. We use the colored rectangle to highlight the focus of this paper. The question and passage encoder layers and attention layers are colored in blue, the fusion layers are colored in green.
  • Figure 2: Improved question-passage attention model. We use blue color to denote question representations and use green color for passage representations.
  • Figure 3: Dynamic attention changes of multiple layers on a visualized example. The matrices are the attention weights computed by the dot-product attention function before any normalization. Generally, the darker the color is the higher the weight is (the only exception is Figure \ref{['fig:qp-l2']} which contains negative values). Given the question "Which NFL team represented the AFC at Super Bowl 50?", the system correctly detects the answer "Denver Broncos" from the passage part "The American Football Conference (AFC) champion Denver Broncos defeated the National Football Conference (NFC) champion Carolina Panthers 24–10 to earn their third Super Bowl title."