Table of Contents
Fetching ...

HEAD-QA: A Healthcare Dataset for Complex Reasoning

David Vilares, Carlos Gómez-Rodríguez

TL;DR

HEAD-QA introduces a domain-specific, multilingual multi-choice QA benchmark drawn from Spanish healthcare exams to probe complex medical reasoning. The authors compare monolingual Spanish and cross-lingual English setups using both information retrieval baselines and neural readers (DrQA, BiDAF, DGEM, Decompatt), highlighting a gap between machine and human performance. Results show cross-lingual IR often outperforms monolingual IR, while neural methods struggle with long, technical questions, underscoring the need for improved information extraction and reasoning in domain-specific QA. The dataset challenges current architectures and offers a valuable testbed for advancing multilingual, domain-focused QA research, with potential extensions to open-domain settings and dataset expansion.

Abstract

We present HEAD-QA, a multi-choice question answering testbed to encourage research on complex reasoning. The questions come from exams to access a specialized position in the Spanish healthcare system, and are challenging even for highly specialized humans. We then consider monolingual (Spanish) and cross-lingual (to English) experiments with information retrieval and neural techniques. We show that: (i) HEAD-QA challenges current methods, and (ii) the results lag well behind human performance, demonstrating its usefulness as a benchmark for future work.

HEAD-QA: A Healthcare Dataset for Complex Reasoning

TL;DR

HEAD-QA introduces a domain-specific, multilingual multi-choice QA benchmark drawn from Spanish healthcare exams to probe complex medical reasoning. The authors compare monolingual Spanish and cross-lingual English setups using both information retrieval baselines and neural readers (DrQA, BiDAF, DGEM, Decompatt), highlighting a gap between machine and human performance. Results show cross-lingual IR often outperforms monolingual IR, while neural methods struggle with long, technical questions, underscoring the need for improved information extraction and reasoning in domain-specific QA. The dataset challenges current architectures and offers a valuable testbed for advancing multilingual, domain-focused QA research, with potential extensions to open-domain settings and dataset expansion.

Abstract

We present HEAD-QA, a multi-choice question answering testbed to encourage research on complex reasoning. The questions come from exams to access a specialized position in the Spanish healthcare system, and are challenging even for highly specialized humans. We then consider monolingual (Spanish) and cross-lingual (to English) experiments with information retrieval and neural techniques. We show that: (i) HEAD-QA challenges current methods, and (ii) the results lag well behind human performance, demonstrating its usefulness as a benchmark for future work.

Paper Structure

This paper contains 24 sections, 1 figure, 9 tables.

Figures (1)

  • Figure 1: Image no 21 from MIR 2017