MedPath: Multi-Domain Cross-Vocabulary Hierarchical Paths for Biomedical Entity Linking

Nishant Mishra; Wilker Aziz; Iacer Calixto

MedPath: Multi-Domain Cross-Vocabulary Hierarchical Paths for Biomedical Entity Linking

Nishant Mishra, Wilker Aziz, Iacer Calixto

TL;DR

MedPath tackles semantic fragmentation, explainability, and semantically-blind evaluation in biomedical entity linking by introducing a large, multi-domain EL dataset. It harmonizes nine expert corpora, normalizes all entities to UMLS CUIs, and provides cross-vocabulary mappings to up to 62 vocabularies along with full hierarchical paths for 11 vocabularies. The work also introduces hierarchy-aware evaluation metrics and demonstrates initial retrieval, reranking, and evaluation results that reveal substantial benefits from using a cross-domain, semantically enriched benchmark. This resource enables training of more interpretable and interoperable clinical NLP models and supports broader evaluation of biomedical EL across diverse vocabularies and knowledge graphs.

Abstract

Progress in biomedical Named Entity Recognition (NER) and Entity Linking (EL) is currently hindered by a fragmented data landscape, a lack of resources for building explainable models, and the limitations of semantically-blind evaluation metrics. To address these challenges, we present MedPath, a large-scale and multi-domain biomedical EL dataset that builds upon nine existing expert-annotated EL datasets. In MedPath, all entities are 1) normalized using the latest version of the Unified Medical Language System (UMLS), 2) augmented with mappings to 62 other biomedical vocabularies and, crucially, 3) enriched with full ontological paths -- i.e., from general to specific -- in up to 11 biomedical vocabularies. MedPath directly enables new research frontiers in biomedical NLP, facilitating training and evaluation of semantic-rich and interpretable EL systems, and the development of the next generation of interoperable and explainable clinical NLP models.

MedPath: Multi-Domain Cross-Vocabulary Hierarchical Paths for Biomedical Entity Linking

TL;DR

Abstract

MedPath: Multi-Domain Cross-Vocabulary Hierarchical Paths for Biomedical Entity Linking

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (17)