Table of Contents
Fetching ...

Query-Reduction Networks for Question Answering

Minjoon Seo, Sewon Min, Ali Farhadi, Hannaneh Hajishirzi

TL;DR

This work tackles multi-hop question answering by introducing Query-Reduction Network (QRN), a slimmed-down RNN unit that reduces the query as it processes a sequence of context sentences. QRN layers can be stacked and optionally run bidirectionally to capture local and global dependencies, and the model supports time-parallelization to speed up training and inference. Empirical results on the bAbI story-based QA and dialog datasets, as well as the DSTC2 dialog data, demonstrate state-of-the-art performance and robust ablations reveal the importance of multiple layers and gating mechanisms. The approach also provides interpretable intermediate queries and gate visualizations, offering insight into the reasoning flow and attention over facts while maintaining efficiency advantages over traditional RNNs.

Abstract

In this paper, we study the problem of question answering when reasoning over multiple facts is required. We propose Query-Reduction Network (QRN), a variant of Recurrent Neural Network (RNN) that effectively handles both short-term (local) and long-term (global) sequential dependencies to reason over multiple facts. QRN considers the context sentences as a sequence of state-changing triggers, and reduces the original query to a more informed query as it observes each trigger (context sentence) through time. Our experiments show that QRN produces the state-of-the-art results in bAbI QA and dialog tasks, and in a real goal-oriented dialog dataset. In addition, QRN formulation allows parallelization on RNN's time axis, saving an order of magnitude in time complexity for training and inference.

Query-Reduction Networks for Question Answering

TL;DR

This work tackles multi-hop question answering by introducing Query-Reduction Network (QRN), a slimmed-down RNN unit that reduces the query as it processes a sequence of context sentences. QRN layers can be stacked and optionally run bidirectionally to capture local and global dependencies, and the model supports time-parallelization to speed up training and inference. Empirical results on the bAbI story-based QA and dialog datasets, as well as the DSTC2 dialog data, demonstrate state-of-the-art performance and robust ablations reveal the importance of multiple layers and gating mechanisms. The approach also provides interpretable intermediate queries and gate visualizations, offering insight into the reasoning flow and attention over facts while maintaining efficiency advantages over traditional RNNs.

Abstract

In this paper, we study the problem of question answering when reasoning over multiple facts is required. We propose Query-Reduction Network (QRN), a variant of Recurrent Neural Network (RNN) that effectively handles both short-term (local) and long-term (global) sequential dependencies to reason over multiple facts. QRN considers the context sentences as a sequence of state-changing triggers, and reduces the original query to a more informed query as it observes each trigger (context sentence) through time. Our experiments show that QRN produces the state-of-the-art results in bAbI QA and dialog tasks, and in a real goal-oriented dialog dataset. In addition, QRN formulation allows parallelization on RNN's time axis, saving an order of magnitude in time complexity for training and inference.

Paper Structure

This paper contains 39 sections, 10 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: (\ref{['fig:unit']}) QRN unit, (\ref{['fig:qrn_ex']}) 2-layer QRN on 5-sentence story, and (\ref{['fig:overview']}) entire QA system (QRN and input / output modules). ${\bm x}, {\bm q}, \hat{\bm y}$ are the story, question and predicted answer in natural language, respectively. ${\bf x}=\langle {\bf x}_1, \ldots , {\bf x}_T \rangle, {\bf q}, \hat{\bf y}$ are their corresponding vector representations (upright font). $\alpha$ and $\rho$ are update gate and reduce functions, respectively. ${\hat{\bf y}}$ is assigned to be ${\bf h}^2_5$, the local query at the last time step in the last layer. Also, red-colored text is the inferred meanings of the vectors (see 'Interpretations' of Section \ref{['subsec:results']}).
  • Figure 2: The schematics of QRN and the two state-of-the-art models, End-to-End Memory Networks (N2N) and Improved Dynamic Memory Networks (DMN+), simplified to emphasize the differences among the models. AGRU is a variant of GRU where the update gate is replaced with soft attention, proposed by DMN. For QRN and DMN+, only forward direction arrows are shown.
  • Figure 3: (top) bAbI QA dataset babi visualization of update and reset gates in QRN '2r' model (bottom two) bAbI dialog and DSTC2 dialog dataset bordes2016learning visualization of update and reset gates in QRN '2r' model. Note that the stories can have as many as 800+ sentences; we only show part of them here. More visualizations are shown in Figure \ref{['fig:qa-att-all']} (bAbI QA) and Figure \ref{['fig:dialog-att-all']} (dialog datasets).
  • Figure 4: Visualization of update and reset gates in QRN '2r' model for on several tasks of bAbI QA (Table \ref{['tab:qa-all']}). We do not put reset gate in the last layer. Note that we only show some of recent sentences here, though the stories can have as many as 200+ sentences.
  • Figure 5: Visualization of update and reset gates in QRN '2r' model for on several tasks of bAbI dialog and DSTC2 dialog (Table \ref{['tab:dialog-all']}). We do not put reset gate in the last layer. Note that we only show some of recent sentences here, even the dialog has more sentences.