Table of Contents
Fetching ...

Dynamic Fusion Networks for Machine Reading Comprehension

Yichong Xu, Jingjing Liu, Jianfeng Gao, Yelong Shen, Xiaodong Liu

TL;DR

DFN addresses the challenge of adapting attention and reasoning to diverse questions in MRC by dynamically constructing a per-sample network using reinforcement learning. It jointly optimizes attention fusion across passages, questions, and answers, and a variable-length reasoning process, achieving state-of-the-art results on the RACE dataset. The work demonstrates that combining dynamic fusion with multi-step reasoning yields interpretable attention vectors and substantial performance gains, with ablations confirming the contributions. This approach paves the way for more flexible, task-adaptive MRC models applicable to varied reading tasks.

Abstract

This paper presents a novel neural model - Dynamic Fusion Network (DFN), for machine reading comprehension (MRC). DFNs differ from most state-of-the-art models in their use of a dynamic multi-strategy attention process, in which passages, questions and answer candidates are jointly fused into attention vectors, along with a dynamic multi-step reasoning module for generating answers. With the use of reinforcement learning, for each input sample that consists of a question, a passage and a list of candidate answers, an instance of DFN with a sample-specific network architecture can be dynamically constructed by determining what attention strategy to apply and how many reasoning steps to take. Experiments show that DFNs achieve the best result reported on RACE, a challenging MRC dataset that contains real human reading questions in a wide variety of types. A detailed empirical analysis also demonstrates that DFNs can produce attention vectors that summarize information from questions, passages and answer candidates more effectively than other popular MRC models.

Dynamic Fusion Networks for Machine Reading Comprehension

TL;DR

DFN addresses the challenge of adapting attention and reasoning to diverse questions in MRC by dynamically constructing a per-sample network using reinforcement learning. It jointly optimizes attention fusion across passages, questions, and answers, and a variable-length reasoning process, achieving state-of-the-art results on the RACE dataset. The work demonstrates that combining dynamic fusion with multi-step reasoning yields interpretable attention vectors and substantial performance gains, with ablations confirming the contributions. This approach paves the way for more flexible, task-adaptive MRC models applicable to varied reading tasks.

Abstract

This paper presents a novel neural model - Dynamic Fusion Network (DFN), for machine reading comprehension (MRC). DFNs differ from most state-of-the-art models in their use of a dynamic multi-strategy attention process, in which passages, questions and answer candidates are jointly fused into attention vectors, along with a dynamic multi-step reasoning module for generating answers. With the use of reinforcement learning, for each input sample that consists of a question, a passage and a list of candidate answers, an instance of DFN with a sample-specific network architecture can be dynamically constructed by determining what attention strategy to apply and how many reasoning steps to take. Experiments show that DFNs achieve the best result reported on RACE, a challenging MRC dataset that contains real human reading questions in a wide variety of types. A detailed empirical analysis also demonstrates that DFNs can produce attention vectors that summarize information from questions, passages and answer candidates more effectively than other popular MRC models.

Paper Structure

This paper contains 18 sections, 8 equations, 5 figures, 5 tables.

Figures (5)

  • Figure 1: Above: Questions from RACE. Below: Questions from SQuAD.
  • Figure 2: Left: Examples from RACE dataset. Right: Examples from CNN dataset.
  • Figure 3: Architecture of DFN. For simplicity, we only draw DFN for one answer candidate $A$. i) Passage, question and answer candidates are independently mapped through word and character encodings in the Lexicon Encoding Layer. ii) The independent encodings are then fed into a BiLSTM in the Context Encoding Layer. iii) The Dynamic Fusion Layer takes a customized attention strategy across the three representations of passage, question and answer candidates. iv) Memory Generation Layer generates a working memory. v) The Answer Scoring Module reads in the memory for a dynamic number of steps. vi) Answer prediction module generates the final output.
  • Figure 4: Diagram illustrating the Attention Strategies and Memory Generation in DFN.
  • Figure 5: Examples of DFN's dynamic selection on attention strategy and reasoning steps. Correct answers are bold and Italic.