Table of Contents
Fetching ...

OEMA: Ontology-Enhanced Multi-Agent Collaboration Framework for Zero-Shot Clinical Named Entity Recognition

Xinli Tao, Xin Dong, Xuezhong Zhou

TL;DR

OEMA tackles zero-shot clinical NER by coordinating three agents—self-annotator, discriminator, and predictor—to generate and curate token-level, ontology-grounded examples. By incorporating SNOMED CT concepts into token-level similarity assessments and fusing entity-type descriptions with structured examples, OEMA narrows the gap between prompt design and self-improvement. Empirical results on MTSamples and VAERS show state-of-the-art exact-match performance and competitive relaxed-match results compared to supervised baselines, with robustness across GPT-3.5 and Gemini backbones. The framework reduces annotation costs while maintaining clinical relevance and interpretability, and it points to future extensions in continual learning, open-domain adaptation, and broader NLP tasks in the clinical domain.

Abstract

With the rapid expansion of unstructured clinical texts in electronic health records (EHRs), clinical named entity recognition (NER) has become a crucial technique for extracting medical information. However, traditional supervised models such as CRF and BioClinicalBERT suffer from high annotation costs. Although zero-shot NER based on large language models (LLMs) reduces the dependency on labeled data, challenges remain in aligning example selection with task granularity and effectively integrating prompt design with self-improvement frameworks. To address these limitations, we propose OEMA, a novel zero-shot clinical NER framework based on multi-agent collaboration. OEMA consists of three core components: (1) a self-annotator that autonomously generates candidate examples; (2) a discriminator that leverages SNOMED CT to filter token-level examples by clinical relevance; and (3) a predictor that incorporates entity-type descriptions to enhance inference accuracy. Experimental results on two benchmark datasets, MTSamples and VAERS, demonstrate that OEMA achieves state-of-the-art performance under exact-match evaluation. Moreover, under related-match criteria, OEMA performs comparably to the supervised BioClinicalBERT model while significantly outperforming the traditional CRF method. OEMA improves zero-shot clinical NER, achieving near-supervised performance under related-match criteria. Future work will focus on continual learning and open-domain adaptation to expand its applicability in clinical NLP.

OEMA: Ontology-Enhanced Multi-Agent Collaboration Framework for Zero-Shot Clinical Named Entity Recognition

TL;DR

OEMA tackles zero-shot clinical NER by coordinating three agents—self-annotator, discriminator, and predictor—to generate and curate token-level, ontology-grounded examples. By incorporating SNOMED CT concepts into token-level similarity assessments and fusing entity-type descriptions with structured examples, OEMA narrows the gap between prompt design and self-improvement. Empirical results on MTSamples and VAERS show state-of-the-art exact-match performance and competitive relaxed-match results compared to supervised baselines, with robustness across GPT-3.5 and Gemini backbones. The framework reduces annotation costs while maintaining clinical relevance and interpretability, and it points to future extensions in continual learning, open-domain adaptation, and broader NLP tasks in the clinical domain.

Abstract

With the rapid expansion of unstructured clinical texts in electronic health records (EHRs), clinical named entity recognition (NER) has become a crucial technique for extracting medical information. However, traditional supervised models such as CRF and BioClinicalBERT suffer from high annotation costs. Although zero-shot NER based on large language models (LLMs) reduces the dependency on labeled data, challenges remain in aligning example selection with task granularity and effectively integrating prompt design with self-improvement frameworks. To address these limitations, we propose OEMA, a novel zero-shot clinical NER framework based on multi-agent collaboration. OEMA consists of three core components: (1) a self-annotator that autonomously generates candidate examples; (2) a discriminator that leverages SNOMED CT to filter token-level examples by clinical relevance; and (3) a predictor that incorporates entity-type descriptions to enhance inference accuracy. Experimental results on two benchmark datasets, MTSamples and VAERS, demonstrate that OEMA achieves state-of-the-art performance under exact-match evaluation. Moreover, under related-match criteria, OEMA performs comparably to the supervised BioClinicalBERT model while significantly outperforming the traditional CRF method. OEMA improves zero-shot clinical NER, achieving near-supervised performance under related-match criteria. Future work will focus on continual learning and open-domain adaptation to expand its applicability in clinical NLP.

Paper Structure

This paper contains 23 sections, 6 equations, 4 figures, 8 tables.

Figures (4)

  • Figure 1: Challenge Analysis Diagram. In zero-shot learning, OEMA tackles two key challenges: (1) the mismatch between example selection and task granularity, and (2) the lack of effective integration between prompt design and the self-improvement framework.
  • Figure 2: Framework of OEMA: (1) Self-Annotator creates a corpus from unlabeled data; (2) Discriminator scores token-level examples via top-level SNOMED CT ontologies; (3) Predictor fuses entity-type descriptors with selected examples to yield the NER output. Agents are grouped by function (dashed boxes) and connected by arrows to show the execution order.
  • Figure 3: F1 scores (%) of Diversified KNN under different $k$ (left, with $K{=}12$) and $K$ (right, with $k{=}3$) settings on MTSamples and VAERS.
  • Figure 4: Two specific case analyses. Text marked in bold green indicates entities corrected by OEMA, text marked in italic red indicates incorrect entities, and text marked in underlined blue indicates entities from high-quality self-annotation examples that may help with error correction.