UnitedQA: A Hybrid Approach for Open Domain Question Answering
Hao Cheng, Yelong Shen, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao
TL;DR
The paper tackles open-domain QA by integrating extractive and generative readers into UnitedQA, a hybrid framework that leverages the complementary strengths of ELECTRA-based extraction and T5-FID-based generation. Each reader is individually enhanced with targeted training (multi-objective weak supervision and PDR for extractive; decoder attention bias and adversarial training for generative), and their outputs are fused via a simple linear interpolation to achieve superior EM on NaturalQuestions and TriviaQA. The approach not only surpasses single models but also outperforms homogeneous ensembles, albeit with higher computational cost, and is supported by thorough ablation and error analyses that illuminate each component’s contribution and failure modes.
Abstract
To date, most of recent work under the retrieval-reader framework for open-domain QA focuses on either extractive or generative reader exclusively. In this paper, we study a hybrid approach for leveraging the strengths of both models. We apply novel techniques to enhance both extractive and generative readers built upon recent pretrained neural language models, and find that proper training methods can provide large improvement over previous state-of-the-art models. We demonstrate that a simple hybrid approach by combining answers from both readers can efficiently take advantages of extractive and generative answer inference strategies and outperforms single models as well as homogeneous ensembles. Our approach outperforms previous state-of-the-art models by 3.3 and 2.7 points in exact match on NaturalQuestions and TriviaQA respectively.
