Table of Contents
Fetching ...

RELATE: Relation Extraction in Biomedical Abstracts with LLMs and Ontology Constraints

Olawumi Olasunkanmi, Mathew Satusky, Hong Yi, Chris Bizon, Harlin Lee, Stanley Ahalt

TL;DR

RELATE tackles the problem of incomplete biomedical knowledge graphs by grounding free-text relation extraction in ontology predicates. It introduces a three-stage pipeline—ontology preprocessing, similarity-based retrieval, and context-aware LLM reranking—paired with negation handling and NONE rejection to produce ontology-aligned KG edges. Evaluations on ChemProt and HEAL demonstrate improved exact-match and broad predicate coverage, while analyses of rejections and negations highlight practical safeguards for real-world literature. The approach offers a scalable, modular path to converting unstructured literature into standardized, interoperable biomedical knowledge graphs.

Abstract

Biomedical knowledge graphs (KGs) are vital for drug discovery and clinical decision support but remain incomplete. Large language models (LLMs) excel at extracting biomedical relations, yet their outputs lack standardization and alignment with ontologies, limiting KG integration. We introduce RELATE, a three-stage pipeline that maps LLM-extracted relations to standardized ontology predicates using ChemProt and the Biolink Model. The pipeline includes: (1) ontology preprocessing with predicate embeddings, (2) similarity-based retrieval enhanced with SapBERT, and (3) LLM-based reranking with explicit negation handling. This approach transforms relation extraction from free-text outputs to structured, ontology-constrained representations. On the ChemProt benchmark, RELATE achieves 52% exact match and 94% accuracy@10, and in 2,400 HEAL Project abstracts, it effectively rejects irrelevant associations (0.4%) and identifies negated assertions. RELATE captures nuanced biomedical relationships while ensuring quality for KG augmentation. By combining vector search with contextual LLM reasoning, RELATE provides a scalable, semantically accurate framework for converting unstructured biomedical literature into standardized KGs.

RELATE: Relation Extraction in Biomedical Abstracts with LLMs and Ontology Constraints

TL;DR

RELATE tackles the problem of incomplete biomedical knowledge graphs by grounding free-text relation extraction in ontology predicates. It introduces a three-stage pipeline—ontology preprocessing, similarity-based retrieval, and context-aware LLM reranking—paired with negation handling and NONE rejection to produce ontology-aligned KG edges. Evaluations on ChemProt and HEAL demonstrate improved exact-match and broad predicate coverage, while analyses of rejections and negations highlight practical safeguards for real-world literature. The approach offers a scalable, modular path to converting unstructured literature into standardized, interoperable biomedical knowledge graphs.

Abstract

Biomedical knowledge graphs (KGs) are vital for drug discovery and clinical decision support but remain incomplete. Large language models (LLMs) excel at extracting biomedical relations, yet their outputs lack standardization and alignment with ontologies, limiting KG integration. We introduce RELATE, a three-stage pipeline that maps LLM-extracted relations to standardized ontology predicates using ChemProt and the Biolink Model. The pipeline includes: (1) ontology preprocessing with predicate embeddings, (2) similarity-based retrieval enhanced with SapBERT, and (3) LLM-based reranking with explicit negation handling. This approach transforms relation extraction from free-text outputs to structured, ontology-constrained representations. On the ChemProt benchmark, RELATE achieves 52% exact match and 94% accuracy@10, and in 2,400 HEAL Project abstracts, it effectively rejects irrelevant associations (0.4%) and identifies negated assertions. RELATE captures nuanced biomedical relationships while ensuring quality for KG augmentation. By combining vector search with contextual LLM reasoning, RELATE provides a scalable, semantically accurate framework for converting unstructured biomedical literature into standardized KGs.

Paper Structure

This paper contains 32 sections, 4 figures, 7 tables.

Figures (4)

  • Figure 1: The three-stage RELATE pipeline: (1) ontology preprocessing, (2) similarity-based retrieval, and (3) contextual refinement. Ontology preprocessing generates embeddings for predicates and their negated variants. It updates only if the ontology schema or embedding models change. Given an input quadruple—subject, object, relation text, and abstract context—RELATE performs similarity-based retrieval by embedding the relation and retrieving top-$k$ ontology candidates. In the final stage, contextual refinement reranks these candidates with an LLM using the full abstract context, producing the most semantically appropriate ontology predicate.
  • Figure 2: Negation generation prompt used in Stage 1 (Section \ref{['stages:stages1']}) for creating negative descriptors from positive descriptors.
  • Figure 3: Contextual reranking prompt used in Stage 3 (Section \ref{['stages:stages3']}) for LLM-based predicate selection with explicit negation handling.
  • Figure 4: SapBERT-augmented RELATE pipeline as in Figure \ref{['fig:workflow']}. The ontology preprocessing now involves both LLM-generated embeddings and SapBERT finetuned embeddings. Additional changes include dual embedding of the relation text, dual similarity searches on the stored embeddings, and merging of the two top-$k$ candidates.