RELATE: Relation Extraction in Biomedical Abstracts with LLMs and Ontology Constraints
Olawumi Olasunkanmi, Mathew Satusky, Hong Yi, Chris Bizon, Harlin Lee, Stanley Ahalt
TL;DR
RELATE tackles the problem of incomplete biomedical knowledge graphs by grounding free-text relation extraction in ontology predicates. It introduces a three-stage pipeline—ontology preprocessing, similarity-based retrieval, and context-aware LLM reranking—paired with negation handling and NONE rejection to produce ontology-aligned KG edges. Evaluations on ChemProt and HEAL demonstrate improved exact-match and broad predicate coverage, while analyses of rejections and negations highlight practical safeguards for real-world literature. The approach offers a scalable, modular path to converting unstructured literature into standardized, interoperable biomedical knowledge graphs.
Abstract
Biomedical knowledge graphs (KGs) are vital for drug discovery and clinical decision support but remain incomplete. Large language models (LLMs) excel at extracting biomedical relations, yet their outputs lack standardization and alignment with ontologies, limiting KG integration. We introduce RELATE, a three-stage pipeline that maps LLM-extracted relations to standardized ontology predicates using ChemProt and the Biolink Model. The pipeline includes: (1) ontology preprocessing with predicate embeddings, (2) similarity-based retrieval enhanced with SapBERT, and (3) LLM-based reranking with explicit negation handling. This approach transforms relation extraction from free-text outputs to structured, ontology-constrained representations. On the ChemProt benchmark, RELATE achieves 52% exact match and 94% accuracy@10, and in 2,400 HEAL Project abstracts, it effectively rejects irrelevant associations (0.4%) and identifies negated assertions. RELATE captures nuanced biomedical relationships while ensuring quality for KG augmentation. By combining vector search with contextual LLM reasoning, RELATE provides a scalable, semantically accurate framework for converting unstructured biomedical literature into standardized KGs.
