Biomedical Entity Linking as Multiple Choice Question Answering

Zhenxi Lin; Ziheng Zhang; Xian Wu; Yefeng Zheng

Biomedical Entity Linking as Multiple Choice Question Answering

Zhenxi Lin, Ziheng Zhang, Xian Wu, Yefeng Zheng

TL;DR

BioELQA tackles the challenge of fine-grained and long-tailed biomedical entity linking by reframing BioEL as a Multiple Choice Question Answering task. It employs a bi-encoder retriever to propose top-$N$ candidate entities, a generator that outputs the symbol of the chosen candidate via a retrieval-enhanced MCP prompt, and a $k$NN module to bring in similar training instances as contextual cues. Empirical results on NCBI, BC5CDR, and COMETA show state-of-the-art accuracy, with ablations confirming the contributions of data augmentation and the retrieval memory. The approach explicitly models both mention-entity and entity-entity interactions and demonstrates improved robustness for long-tailed and morphologically similar entities, offering a practical, scalable solution for BioEL without relying on external synonym corpora. The work suggests future directions to incorporate contextual disambiguation alongside the retrieval-augmented framework.

Abstract

Although biomedical entity linking (BioEL) has made significant progress with pre-trained language models, challenges still exist for fine-grained and long-tailed entities. To address these challenges, we present BioELQA, a novel model that treats Biomedical Entity Linking as Multiple Choice Question Answering. BioELQA first obtains candidate entities with a fast retriever, jointly presents the mention and candidate entities to a generator, and then outputs the predicted symbol associated with its chosen entity. This formulation enables explicit comparison of different candidate entities, thus capturing fine-grained interactions between mentions and entities, as well as among entities themselves. To improve generalization for long-tailed entities, we retrieve similar labeled training instances as clues and concatenate the input with retrieved instances for the generator. Extensive experimental results show that BioELQA outperforms state-of-the-art baselines on several datasets.

Biomedical Entity Linking as Multiple Choice Question Answering

TL;DR

candidate entities, a generator that outputs the symbol of the chosen candidate via a retrieval-enhanced MCP prompt, and a

NN module to bring in similar training instances as contextual cues. Empirical results on NCBI, BC5CDR, and COMETA show state-of-the-art accuracy, with ablations confirming the contributions of data augmentation and the retrieval memory. The approach explicitly models both mention-entity and entity-entity interactions and demonstrates improved robustness for long-tailed and morphologically similar entities, offering a practical, scalable solution for BioEL without relying on external synonym corpora. The work suggests future directions to incorporate contextual disambiguation alongside the retrieval-augmented framework.

Abstract

Paper Structure (13 sections, 4 equations, 2 figures, 6 tables)

This paper contains 13 sections, 4 equations, 2 figures, 6 tables.

Introduction
Method
Retriever
Generator
$k$NN Module
Experiments
Experimental Setup
Overall Results
Ablation Study
Case Study
Impacts of hyper-parameters
Conclusion
Bibliographical References

Figures (2)

Figure 1: The overview of the proposed BioELQA.
Figure 2: Impact of different hyper-parameters on the COMETA dataset.

Biomedical Entity Linking as Multiple Choice Question Answering

TL;DR

Abstract

Biomedical Entity Linking as Multiple Choice Question Answering

Authors

TL;DR

Abstract

Table of Contents

Figures (2)