Table of Contents
Fetching ...

Overview of BioASQ 2023: The eleventh BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering

Anastasios Nentidis, Georgios Katsimpras, Anastasia Krithara, Salvador Lima López, Eulália Farré-Maduell, Luis Gasco, Martin Krallinger, Georgios Paliouras

TL;DR

The paper surveys the eleventh BioASQ challenge (BioASQ11, CLEF 2023), detailing three tasks: Task 11b for biomedical QA in English, Task Synergy 11 for open-question QA via iterative expert feedback, and the new MedProcNER track for Spanish clinical procedures. It presents dataset scales, evaluation protocols, and participant engagement across tasks, including descriptions of training and test splits, baselines, and multilingual resources. Preliminary results reveal progressive gains in yes/no and factoid answering and illustrate the evolving role of transformer and GPT-based methods, alongside the emergence of multilingual annotation and indexing resources. The work underscores BioASQ's ongoing impact on advancing biomedical semantic indexing and QA, extending multilingual capabilities and collaborative, expert-in-the-loop approaches with plans for further data and task expansion.

Abstract

This is an overview of the eleventh edition of the BioASQ challenge in the context of the Conference and Labs of the Evaluation Forum (CLEF) 2023. BioASQ is a series of international challenges promoting advances in large-scale biomedical semantic indexing and question answering. This year, BioASQ consisted of new editions of the two established tasks b and Synergy, and a new task (MedProcNER) on semantic annotation of clinical content in Spanish with medical procedures, which have a critical role in medical practice. In this edition of BioASQ, 28 competing teams submitted the results of more than 150 distinct systems in total for the three different shared tasks of the challenge. Similarly to previous editions, most of the participating systems achieved competitive performance, suggesting the continuous advancement of the state-of-the-art in the field.

Overview of BioASQ 2023: The eleventh BioASQ challenge on Large-Scale Biomedical Semantic Indexing and Question Answering

TL;DR

The paper surveys the eleventh BioASQ challenge (BioASQ11, CLEF 2023), detailing three tasks: Task 11b for biomedical QA in English, Task Synergy 11 for open-question QA via iterative expert feedback, and the new MedProcNER track for Spanish clinical procedures. It presents dataset scales, evaluation protocols, and participant engagement across tasks, including descriptions of training and test splits, baselines, and multilingual resources. Preliminary results reveal progressive gains in yes/no and factoid answering and illustrate the evolving role of transformer and GPT-based methods, alongside the emergence of multilingual annotation and indexing resources. The work underscores BioASQ's ongoing impact on advancing biomedical semantic indexing and QA, extending multilingual capabilities and collaborative, expert-in-the-loop approaches with plans for further data and task expansion.

Abstract

This is an overview of the eleventh edition of the BioASQ challenge in the context of the Conference and Labs of the Evaluation Forum (CLEF) 2023. BioASQ is a series of international challenges promoting advances in large-scale biomedical semantic indexing and question answering. This year, BioASQ consisted of new editions of the two established tasks b and Synergy, and a new task (MedProcNER) on semantic annotation of clinical content in Spanish with medical procedures, which have a critical role in medical practice. In this edition of BioASQ, 28 competing teams submitted the results of more than 150 distinct systems in total for the three different shared tasks of the challenge. Similarly to previous editions, most of the participating systems achieved competitive performance, suggesting the continuous advancement of the state-of-the-art in the field.
Paper Structure (15 sections, 3 figures, 10 tables)

This paper contains 15 sections, 3 figures, 10 tables.

Figures (3)

  • Figure 1: The iterative dialogue between the experts and the systems in the BioASQ Synergy task on question answering for developing biomedical problems.
  • Figure 2: Overview of the MedProcNER Shared Task.
  • Figure 3: The evaluation scores of the best-performing systems in task B, Phase B, for exact answers, across the eleven years of the BioASQ challenge. Since BioASQ6 the official measure for Yes/No questions is the macro-averaged F1 score (macro F1), but accuracy (Acc) is also presented as the former official measure. The black dots in 10.6 highlight that these scores are for an additional batch with questions from new experts nentidis2021overview.