Table of Contents
Fetching ...

Learning to Answer by Learning to Ask: Getting the Best of GPT-2 and BERT Worlds

Tassilo Klein, Moin Nabi

TL;DR

The paper tackles the data-hungry nature of QA by linking question generation (QG) with question answering (QA) in a single framework. It introduces an end-to-end system that fuses GPT-2 for QG with BERT for span-based QA, using QA feedback to train QG in a semi-supervised loop. Their results on SQuAD 1.1 show diverse, semantically valid questions and substantial QA performance gains, with QA-based surrogate metrics providing a robust evaluation signal beyond lexical similarity. The work highlights potential for reduced annotation burden and improved QG in low-data regimes, while pointing toward future unsupervised QG development.

Abstract

Automatic question generation aims at the generation of questions from a context, with the corresponding answers being sub-spans of the given passage. Whereas, most of the methods mostly rely on heuristic rules to generate questions, more recently also neural network approaches have been proposed. In this work, we propose a variant of the self-attention Transformer network architectures model to generate meaningful and diverse questions. To this end, we propose an easy to use model consisting of the conjunction of the Transformer decoder GPT-2 model with Transformer encoder BERT for the downstream task for question answering. The model is trained in an end-to-end fashion, where the language model is trained to produce a question-answer-aware input representation that facilitates to generate an answer focused question. Our result of neural question generation from text on the SQuAD 1.1 dataset suggests that our method can produce semantically correct and diverse questions. Additionally, we assessed the performance of our proposed method for the downstream task of question answering. The analysis shows that our proposed generation & answering collaboration framework relatively improves both tasks and is particularly powerful in the semi-supervised setup. The results further suggest a robust and comparably lean pipeline facilitating question generation in the small-data regime.

Learning to Answer by Learning to Ask: Getting the Best of GPT-2 and BERT Worlds

TL;DR

The paper tackles the data-hungry nature of QA by linking question generation (QG) with question answering (QA) in a single framework. It introduces an end-to-end system that fuses GPT-2 for QG with BERT for span-based QA, using QA feedback to train QG in a semi-supervised loop. Their results on SQuAD 1.1 show diverse, semantically valid questions and substantial QA performance gains, with QA-based surrogate metrics providing a robust evaluation signal beyond lexical similarity. The work highlights potential for reduced annotation burden and improved QG in low-data regimes, while pointing toward future unsupervised QG development.

Abstract

Automatic question generation aims at the generation of questions from a context, with the corresponding answers being sub-spans of the given passage. Whereas, most of the methods mostly rely on heuristic rules to generate questions, more recently also neural network approaches have been proposed. In this work, we propose a variant of the self-attention Transformer network architectures model to generate meaningful and diverse questions. To this end, we propose an easy to use model consisting of the conjunction of the Transformer decoder GPT-2 model with Transformer encoder BERT for the downstream task for question answering. The model is trained in an end-to-end fashion, where the language model is trained to produce a question-answer-aware input representation that facilitates to generate an answer focused question. Our result of neural question generation from text on the SQuAD 1.1 dataset suggests that our method can produce semantically correct and diverse questions. Additionally, we assessed the performance of our proposed method for the downstream task of question answering. The analysis shows that our proposed generation & answering collaboration framework relatively improves both tasks and is particularly powerful in the semi-supervised setup. The results further suggest a robust and comparably lean pipeline facilitating question generation in the small-data regime.

Paper Structure

This paper contains 11 sections, 4 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: Illustration of the pre-training sketch: Each network, i.e. GPT-2 and BERT, is individually trained to answer questions using a QA head assigning probabilities to each token to be beginning and/or end of the answer span. The small blue box corresponds to an annotated answer. The small orange square denotes the question, whereas the green box indicates the answer span annotation returned by the models. (Best viewed in color)
  • Figure 2: Overview of the fine-tuning of the approach: Given a SQuAD context and an annotated answer (blue box), a question is generated using GPT-2. The generated answer is denoted with the orange box of question marks. The SQuAD context endowed with the generated question is given to the pre-trained BERT network. BERT then generates an answer span, denoted with green box. If BERT is unable to provide the correct answer, the language model's loss is backpropagated to GPT-2 w.r.t. the annotated context. (Best viewed in color)
  • Figure 3: Some qualitative examples of the questions generated by the proposed method. Each box contains the context. Within the context the answer tag is delimited by "$>>$" and "$<<$". Colored text denotes question generated with different language models. "GPT-2 LM" corresponds to the GPT-2 LM fine-tuned for question generation without optimization. "BERT-Feedback" corresponds to the approach using BERT as QA feedback module during training. "GPT-2 Feedback" corresponds to the approach employing GPT-2 as feedback mechanism. "GT" stands for groundtruth. Diversity in answers generated by the proposed approach the semantic richness of the question generation.