OEMA: Ontology-Enhanced Multi-Agent Collaboration Framework for Zero-Shot Clinical Named Entity Recognition
Xinli Tao, Xin Dong, Xuezhong Zhou
TL;DR
OEMA tackles zero-shot clinical NER by coordinating three agents—self-annotator, discriminator, and predictor—to generate and curate token-level, ontology-grounded examples. By incorporating SNOMED CT concepts into token-level similarity assessments and fusing entity-type descriptions with structured examples, OEMA narrows the gap between prompt design and self-improvement. Empirical results on MTSamples and VAERS show state-of-the-art exact-match performance and competitive relaxed-match results compared to supervised baselines, with robustness across GPT-3.5 and Gemini backbones. The framework reduces annotation costs while maintaining clinical relevance and interpretability, and it points to future extensions in continual learning, open-domain adaptation, and broader NLP tasks in the clinical domain.
Abstract
With the rapid expansion of unstructured clinical texts in electronic health records (EHRs), clinical named entity recognition (NER) has become a crucial technique for extracting medical information. However, traditional supervised models such as CRF and BioClinicalBERT suffer from high annotation costs. Although zero-shot NER based on large language models (LLMs) reduces the dependency on labeled data, challenges remain in aligning example selection with task granularity and effectively integrating prompt design with self-improvement frameworks. To address these limitations, we propose OEMA, a novel zero-shot clinical NER framework based on multi-agent collaboration. OEMA consists of three core components: (1) a self-annotator that autonomously generates candidate examples; (2) a discriminator that leverages SNOMED CT to filter token-level examples by clinical relevance; and (3) a predictor that incorporates entity-type descriptions to enhance inference accuracy. Experimental results on two benchmark datasets, MTSamples and VAERS, demonstrate that OEMA achieves state-of-the-art performance under exact-match evaluation. Moreover, under related-match criteria, OEMA performs comparably to the supervised BioClinicalBERT model while significantly outperforming the traditional CRF method. OEMA improves zero-shot clinical NER, achieving near-supervised performance under related-match criteria. Future work will focus on continual learning and open-domain adaptation to expand its applicability in clinical NLP.
