Table of Contents
Fetching ...

UnitedQA: A Hybrid Approach for Open Domain Question Answering

Hao Cheng, Yelong Shen, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao

TL;DR

The paper tackles open-domain QA by integrating extractive and generative readers into UnitedQA, a hybrid framework that leverages the complementary strengths of ELECTRA-based extraction and T5-FID-based generation. Each reader is individually enhanced with targeted training (multi-objective weak supervision and PDR for extractive; decoder attention bias and adversarial training for generative), and their outputs are fused via a simple linear interpolation to achieve superior EM on NaturalQuestions and TriviaQA. The approach not only surpasses single models but also outperforms homogeneous ensembles, albeit with higher computational cost, and is supported by thorough ablation and error analyses that illuminate each component’s contribution and failure modes.

Abstract

To date, most of recent work under the retrieval-reader framework for open-domain QA focuses on either extractive or generative reader exclusively. In this paper, we study a hybrid approach for leveraging the strengths of both models. We apply novel techniques to enhance both extractive and generative readers built upon recent pretrained neural language models, and find that proper training methods can provide large improvement over previous state-of-the-art models. We demonstrate that a simple hybrid approach by combining answers from both readers can efficiently take advantages of extractive and generative answer inference strategies and outperforms single models as well as homogeneous ensembles. Our approach outperforms previous state-of-the-art models by 3.3 and 2.7 points in exact match on NaturalQuestions and TriviaQA respectively.

UnitedQA: A Hybrid Approach for Open Domain Question Answering

TL;DR

The paper tackles open-domain QA by integrating extractive and generative readers into UnitedQA, a hybrid framework that leverages the complementary strengths of ELECTRA-based extraction and T5-FID-based generation. Each reader is individually enhanced with targeted training (multi-objective weak supervision and PDR for extractive; decoder attention bias and adversarial training for generative), and their outputs are fused via a simple linear interpolation to achieve superior EM on NaturalQuestions and TriviaQA. The approach not only surpasses single models but also outperforms homogeneous ensembles, albeit with higher computational cost, and is supported by thorough ablation and error analyses that illuminate each component’s contribution and failure modes.

Abstract

To date, most of recent work under the retrieval-reader framework for open-domain QA focuses on either extractive or generative reader exclusively. In this paper, we study a hybrid approach for leveraging the strengths of both models. We apply novel techniques to enhance both extractive and generative readers built upon recent pretrained neural language models, and find that proper training methods can provide large improvement over previous state-of-the-art models. We demonstrate that a simple hybrid approach by combining answers from both readers can efficiently take advantages of extractive and generative answer inference strategies and outperforms single models as well as homogeneous ensembles. Our approach outperforms previous state-of-the-art models by 3.3 and 2.7 points in exact match on NaturalQuestions and TriviaQA respectively.

Paper Structure

This paper contains 19 sections, 10 equations, 2 figures, 7 tables.

Figures (2)

  • Figure 1: Pairwise prediction agreement ratio. G-1, G-2, G-3 and E-1, E-2, E-3 are three different generative and extractive readers respectively. All readers achieve similar performance ($\approx52\%$ exact match) on NaturalQuestions. Higher agreement (>$50\%$) in red and lower agreement (<$50\%$) in gray. The agreement is calculated based on exact string match.
  • Figure 2: Relative accuracy of different WH questions. The relative accuracy is the relative change of a WH category accuracy to the overall model accuracy.