Table of Contents
Fetching ...

Hierarchical Retrieval with Out-Of-Vocabulary Queries: A Case Study on SNOMED CT

Jonathon Dilworth, Hui Yang, Jiaoyan Chen, Yongsheng Gao

TL;DR

This work tackles the challenge of retrieving SNOMED CT concepts when queries are out-of-vocabulary (OOV) by reframing retrieval as hierarchical (HR) reasoning. It leverages language-model-based ontology embeddings in hyperbolic space, via HiT and OnT, to infer subsumption relationships and rank candidates with a depth-biased score. Across experiments with an OOV dataset derived from MIRAGE, the OnT model consistently outperforms lexical baselines and SBERT, with strong gains as the permissible hop distance $d$ increases. The approach is demonstrated to be generalizable to other ontologies, offering practical benefits for clinical decision support and terminology navigation, and the authors release code, tools, and data for reuse.

Abstract

SNOMED CT is a biomedical ontology with a hierarchical representation of large-scale concepts. Knowledge retrieval in SNOMED CT is critical for its application, but often proves challenging due to language ambiguity, synonyms, polysemies and so on. This problem is exacerbated when the queries are out-of-vocabulary (OOV), i.e., having no equivalent matchings in the ontology. In this work, we focus on the problem of hierarchical concept retrieval from SNOMED CT with OOV queries, and propose an approach based on language model-based ontology embeddings. For evaluation, we construct OOV queries annotated against SNOMED CT concepts, testing the retrieval of the most direct subsumers and their less relevant ancestors. We find that our method outperforms the baselines including SBERT and two lexical matching methods. While evaluated against SNOMED CT, the approach is generalisable and can be extended to other ontologies. We release code, tools, and evaluation datasets at https://github.com/jonathondilworth/HR-OOV.

Hierarchical Retrieval with Out-Of-Vocabulary Queries: A Case Study on SNOMED CT

TL;DR

This work tackles the challenge of retrieving SNOMED CT concepts when queries are out-of-vocabulary (OOV) by reframing retrieval as hierarchical (HR) reasoning. It leverages language-model-based ontology embeddings in hyperbolic space, via HiT and OnT, to infer subsumption relationships and rank candidates with a depth-biased score. Across experiments with an OOV dataset derived from MIRAGE, the OnT model consistently outperforms lexical baselines and SBERT, with strong gains as the permissible hop distance increases. The approach is demonstrated to be generalizable to other ontologies, offering practical benefits for clinical decision support and terminology navigation, and the authors release code, tools, and data for reuse.

Abstract

SNOMED CT is a biomedical ontology with a hierarchical representation of large-scale concepts. Knowledge retrieval in SNOMED CT is critical for its application, but often proves challenging due to language ambiguity, synonyms, polysemies and so on. This problem is exacerbated when the queries are out-of-vocabulary (OOV), i.e., having no equivalent matchings in the ontology. In this work, we focus on the problem of hierarchical concept retrieval from SNOMED CT with OOV queries, and propose an approach based on language model-based ontology embeddings. For evaluation, we construct OOV queries annotated against SNOMED CT concepts, testing the retrieval of the most direct subsumers and their less relevant ancestors. We find that our method outperforms the baselines including SBERT and two lexical matching methods. While evaluated against SNOMED CT, the approach is generalisable and can be extended to other ontologies. We release code, tools, and evaluation datasets at https://github.com/jonathondilworth/HR-OOV.

Paper Structure

This paper contains 18 sections, 1 equation, 3 figures, 3 tables.

Figures (3)

  • Figure 1: SNOMED CT concept hierarchy fragment showing hierarchical structure and OOV query positioning. The OOV query "tingling pins sensation" (grey) represents a prospective conceptualisation that may be subsumed under Pins and needles, which is subsumed by Paresthesia.
  • Figure 2: Depth intuition within the Poincaré Ball. Concepts that capture generality remain nearer the origin (i.e., the depth remains shallow), whereas those with greater specificity are positioned towards the circumference.
  • Figure 3: Architecture for embedding and retrieval using HiT and OnT. Both approaches map ontology concepts to hyperbolic space via text-based representations. Hyperbolic distance and subsumption score are applied for HR, enabling an exhaustive ranking for OOV search.