CasiMedicos-Arg: A Medical Question Answering Dataset Annotated with Explanatory Argumentative Structures
Ekaterina Sviridova, Anar Yeginbergen, Ainara Estarrona, Elena Cabrio, Serena Villata, Rodrigo Agerri
TL;DR
This work tackles the need for explainable medical QA by introducing CasiMedicos-Arg, a multilingual dataset of 558 clinical cases labeled with doctor-authored explanations annotated for argumentative structure (claims, premises, support/attack) across English, Spanish, French, and Italian. The authors provide detailed annotation guidelines, measure inter-annotator agreement, and demonstrate a robust projection pipeline to create a multilingual resource, accompanied by strong baselines for argument component detection using diverse LM architectures. They show that multilingual data-transfer and decoder-only models yield competitive performance, highlighting the benefits of cross-language transfer for explainable medical reasoning. The dataset, code, and models are publicly released to advance research in medical argument mining and explainable AI in clinical contexts.
Abstract
Explaining Artificial Intelligence (AI) decisions is a major challenge nowadays in AI, in particular when applied to sensitive scenarios like medicine and law. However, the need to explain the rationale behind decisions is a main issue also for human-based deliberation as it is important to justify \textit{why} a certain decision has been taken. Resident medical doctors for instance are required not only to provide a (possibly correct) diagnosis, but also to explain how they reached a certain conclusion. Developing new tools to aid residents to train their explanation skills is therefore a central objective of AI in education. In this paper, we follow this direction, and we present, to the best of our knowledge, the first multilingual dataset for Medical Question Answering where correct and incorrect diagnoses for a clinical case are enriched with a natural language explanation written by doctors. These explanations have been manually annotated with argument components (i.e., premise, claim) and argument relations (i.e., attack, support), resulting in the Multilingual CasiMedicos-Arg dataset which consists of 558 clinical cases in four languages (English, Spanish, French, Italian) with explanations, where we annotated 5021 claims, 2313 premises, 2431 support relations, and 1106 attack relations. We conclude by showing how competitive baselines perform over this challenging dataset for the argument mining task.
