DuoRC: Towards Complex Language Understanding with Paraphrased Reading Comprehension
Amrita Saha, Rahul Aralikatte, Mitesh M. Khapra, Karthik Sankaranarayanan
TL;DR
DuoRC presents a large-scale RC dataset built from parallel movie plots (Wikipedia vs IMDb) where questions are generated from one version and answers drawn from the other, creating low lexical overlap and necessitating external knowledge, coreference and multi-sentence inference. The authors establish baselines with SpanModel (BiDAF) and GenModel (span prediction plus abstractive generation) and demonstrate that ParaphraseRC is substantially harder than SelfRC, with preprocessing and data augmentation offering limited gains. The work shows that existing SQuAD-style models perform poorly on this dataset, highlighting new research directions for narrative reasoning, unanswerability detection, and cross-version paraphrase understanding. As a complementary benchmark, DuoRC aims to drive progress toward more robust, knowledge-enabled QA systems capable of complex language understanding.
Abstract
We propose DuoRC, a novel dataset for Reading Comprehension (RC) that motivates several new challenges for neural approaches in language understanding beyond those offered by existing RC datasets. DuoRC contains 186,089 unique question-answer pairs created from a collection of 7680 pairs of movie plots where each pair in the collection reflects two versions of the same movie - one from Wikipedia and the other from IMDb - written by two different authors. We asked crowdsourced workers to create questions from one version of the plot and a different set of workers to extract or synthesize answers from the other version. This unique characteristic of DuoRC where questions and answers are created from different versions of a document narrating the same underlying story, ensures by design, that there is very little lexical overlap between the questions created from one version and the segments containing the answer in the other version. Further, since the two versions have different levels of plot detail, narration style, vocabulary, etc., answering questions from the second version requires deeper language understanding and incorporating external background knowledge. Additionally, the narrative style of passages arising from movie plots (as opposed to typical descriptive passages in existing datasets) exhibits the need to perform complex reasoning over events across multiple sentences. Indeed, we observe that state-of-the-art neural RC models which have achieved near human performance on the SQuAD dataset, even when coupled with traditional NLP techniques to address the challenges presented in DuoRC exhibit very poor performance (F1 score of 37.42% on DuoRC v/s 86% on SQuAD dataset). This opens up several interesting research avenues wherein DuoRC could complement other RC datasets to explore novel neural approaches for studying language understanding.
