Enhancing Document Retrieval for Curating N-ary Relations in Knowledge Bases
Xing David Wang, Ulf Leser
TL;DR
This work tackles the challenge of retrieving documents to complete $n$-ary relations for biomedical knowledge-base curation. It introduces EDEL, a dense bi-encoder retriever that leverages weak supervision from KB-linked publications and employs a layered margin loss along with KB-aware hard negative sampling to handle noisy signals. Two new benchmarks, Precision Oncology (PO) and Post-Translational Modifications (PTM), demonstrate state-of-the-art performance, with notable gains in NDCG@10 and EntityRecall over zero-shot and fine-tuned baselines. The approach promises more efficient and reliable evidence retrieval to support complex biomedical curation tasks and can be extended to broader domains requiring structured relation completion.
Abstract
Curation of biomedical knowledge bases (KBs) relies on extracting accurate multi-entity relational facts from the literature - a process that remains largely manual and expert-driven. An essential step in this workflow is retrieving documents that can support or complete partially observed n-ary relations. We present a neural retrieval model designed to assist KB curation by identifying documents that help fill in missing relation arguments and provide relevant contextual evidence. To reduce dependence on scarce gold-standard training data, we exploit existing KB records to construct weakly supervised training sets. Our approach introduces two key technical contributions: (i) a layered contrastive loss that enables learning from noisy and incomplete relational structures, and (ii) a balanced sampling strategy that generates high-quality negatives from diverse KB records. On two biomedical retrieval benchmarks, our approach achieves state-of-the-art performance, outperforming strong baselines in NDCG@10 by 5.7 and 3.7 percentage points, respectively.
