Table of Contents
Fetching ...

Explainable Multi-hop Question Generation: An End-to-End Approach without Intermediate Question Labeling

Seonjeong Hwang, Yunsu Kim, Gary Geunbae Lee

TL;DR

This work tackles the challenge of generating complex multi-hop questions by introducing the End-to-End Question Rewriting (E2EQR) model, which incrementally increases question complexity through sequential rewriting while enabling end-to-end training without intermediate question labels. The approach hinges on an unfoldable Transformer-based RNN with accumulated self- and cross-attention, guided by a document graph that arranges input materials and bridge entities to support coherent rewrites across hops. An adaptive curriculum learning strategy enables robust learning from 1-hop to N-hop questions, improving performance and mitigating catastrophic forgetting. Empirical results on MuSiQue and HotpotQA show strong automatic and human-evaluated quality, with synthetic multi-hop QA data derived from E2EQR enhancing QA model training, especially for higher-hop questions. The method yields explainable, logically structured multi-hop questions and demonstrates practical value for data augmentation in multi-hop QA tasks.

Abstract

In response to the increasing use of interactive artificial intelligence, the demand for the capacity to handle complex questions has increased. Multi-hop question generation aims to generate complex questions that requires multi-step reasoning over several documents. Previous studies have predominantly utilized end-to-end models, wherein questions are decoded based on the representation of context documents. However, these approaches lack the ability to explain the reasoning process behind the generated multi-hop questions. Additionally, the question rewriting approach, which incrementally increases the question complexity, also has limitations due to the requirement of labeling data for intermediate-stage questions. In this paper, we introduce an end-to-end question rewriting model that increases question complexity through sequential rewriting. The proposed model has the advantage of training with only the final multi-hop questions, without intermediate questions. Experimental results demonstrate the effectiveness of our model in generating complex questions, particularly 3- and 4-hop questions, which are appropriately paired with input answers. We also prove that our model logically and incrementally increases the complexity of questions, and the generated multi-hop questions are also beneficial for training question answering models.

Explainable Multi-hop Question Generation: An End-to-End Approach without Intermediate Question Labeling

TL;DR

This work tackles the challenge of generating complex multi-hop questions by introducing the End-to-End Question Rewriting (E2EQR) model, which incrementally increases question complexity through sequential rewriting while enabling end-to-end training without intermediate question labels. The approach hinges on an unfoldable Transformer-based RNN with accumulated self- and cross-attention, guided by a document graph that arranges input materials and bridge entities to support coherent rewrites across hops. An adaptive curriculum learning strategy enables robust learning from 1-hop to N-hop questions, improving performance and mitigating catastrophic forgetting. Empirical results on MuSiQue and HotpotQA show strong automatic and human-evaluated quality, with synthetic multi-hop QA data derived from E2EQR enhancing QA model training, especially for higher-hop questions. The method yields explainable, logically structured multi-hop questions and demonstrates practical value for data augmentation in multi-hop QA tasks.

Abstract

In response to the increasing use of interactive artificial intelligence, the demand for the capacity to handle complex questions has increased. Multi-hop question generation aims to generate complex questions that requires multi-step reasoning over several documents. Previous studies have predominantly utilized end-to-end models, wherein questions are decoded based on the representation of context documents. However, these approaches lack the ability to explain the reasoning process behind the generated multi-hop questions. Additionally, the question rewriting approach, which incrementally increases the question complexity, also has limitations due to the requirement of labeling data for intermediate-stage questions. In this paper, we introduce an end-to-end question rewriting model that increases question complexity through sequential rewriting. The proposed model has the advantage of training with only the final multi-hop questions, without intermediate questions. Experimental results demonstrate the effectiveness of our model in generating complex questions, particularly 3- and 4-hop questions, which are appropriately paired with input answers. We also prove that our model logically and incrementally increases the complexity of questions, and the generated multi-hop questions are also beneficial for training question answering models.
Paper Structure (26 sections, 5 equations, 3 figures, 12 tables, 1 algorithm)

This paper contains 26 sections, 5 equations, 3 figures, 12 tables, 1 algorithm.

Figures (3)

  • Figure 1: Example of multi-hop question generation through question rewriting.
  • Figure 2: Unfolded architecture of the proposed model. Training process for 3-hop question generation. In the decoder, the multi-head masked self-attention and multi-head cross attention layers use the key and value matrices ($K$ and $V$) accumulated from the prior steps to rewrite the intermediate question generated in the previous step. We omitted the detailed elements of the Transformer vaswani2017attention in this figure.
  • Figure 3: The performance of three QA models trained on the synthetic data generated by BART and E2EQR and the MuSiQue training$_{unseen}$ set (Ground Truth), respectively.