Transfer Learning Enhanced Single-choice Decision for Multi-choice Question Answering
Chenhao Cui, Yufan Jiang, Shuangzhi Wu, Zhoujun Li
TL;DR
This paper reframes multi-choice machine reading comprehension as a per-option binary classification problem, enabling transfer learning from diverse QA datasets beyond the target MMRC tasks. It introduces a single-choice model built on a pretrained encoder with layer-wise adaptive attention, scoring each candidate option independently via g(P,Q,A_i) = σ(W H_L + b) where H_L combines per-layer representations. Transfer learning from datasets such as SQuAD2.0, ARC, and CoQA is achieved by preprocessing data into a common input format, training on mixed data, and fine-tuning on the target RACE data, yielding state-of-the-art results on RACE and DREAM. The approach demonstrates that decoupling option evaluation and leveraging broader QA resources can improve robustness and generalization in MMRC systems, with practical impact for cross-domain QA systems and ensemble performance.
Abstract
Multi-choice Machine Reading Comprehension (MMRC) aims to select the correct answer from a set of options based on a given passage and question. The existing methods employ the pre-trained language model as the encoder, share and transfer knowledge through fine-tuning.These methods mainly focus on the design of exquisite mechanisms to effectively capture the relationships among the triplet of passage, question and answers. It is non-trivial but ignored to transfer knowledge from other MRC tasks such as SQuAD due to task specific of MMRC.In this paper, we reconstruct multi-choice to single-choice by training a binary classification to distinguish whether a certain answer is correct. Then select the option with the highest confidence score as the final answer. Our proposed method gets rid of the multi-choice framework and can leverage resources of other tasks. We construct our model based on the ALBERT-xxlarge model and evaluate it on the RACE and DREAM datasets. Experimental results show that our model performs better than multi-choice methods. In addition, by transferring knowledge from other kinds of MRC tasks, our model achieves state-of-the-art results in both single and ensemble settings.
