Table of Contents
Fetching ...

AutoPCR: Automated Phenotype Concept Recognition by Prompting

Yicheng Tao, Yuanhao Huang, Yiqun Wang, Xin Luo, Jie Liu

TL;DR

AutoPCR is presented, a prompt-based phenotype CR method that does not require ontology-specific training and achieves the best average and most robust performance across both mention-level and document-level evaluations, surpassing prior state-of-the-art methods.

Abstract

Phenotype concept recognition (CR) is a fundamental task in biomedical text mining, enabling applications such as clinical diagnostics and knowledge graph construction. However, existing methods often require ontology-specific training and struggle to generalize across diverse text types and evolving biomedical terminology. We present AutoPCR, a prompt-based phenotype CR method that does not require ontology-specific training. AutoPCR performs CR in three stages: entity extraction using a hybrid of rule-based and neural tagging strategies, candidate retrieval via SapBERT, and entity linking through prompting a large language model. Experiments on four benchmark datasets show that AutoPCR achieves the best average and most robust performance across both mention-level and document-level evaluations, surpassing prior state-of-the-art methods. Further ablation and transfer studies demonstrate its inductive capability and generalizability to new ontologies.

AutoPCR: Automated Phenotype Concept Recognition by Prompting

TL;DR

AutoPCR is presented, a prompt-based phenotype CR method that does not require ontology-specific training and achieves the best average and most robust performance across both mention-level and document-level evaluations, surpassing prior state-of-the-art methods.

Abstract

Phenotype concept recognition (CR) is a fundamental task in biomedical text mining, enabling applications such as clinical diagnostics and knowledge graph construction. However, existing methods often require ontology-specific training and struggle to generalize across diverse text types and evolving biomedical terminology. We present AutoPCR, a prompt-based phenotype CR method that does not require ontology-specific training. AutoPCR performs CR in three stages: entity extraction using a hybrid of rule-based and neural tagging strategies, candidate retrieval via SapBERT, and entity linking through prompting a large language model. Experiments on four benchmark datasets show that AutoPCR achieves the best average and most robust performance across both mention-level and document-level evaluations, surpassing prior state-of-the-art methods. Further ablation and transfer studies demonstrate its inductive capability and generalizability to new ontologies.

Paper Structure

This paper contains 17 sections, 4 equations, 3 figures, 6 tables.

Figures (3)

  • Figure 1: Architecture of AutoPCR. It performs concept recognition in three stages: entity extraction using BioNER and syntactic entity extraction (e.g., extracted entities $e_1$, $e_2$, and $e_3$), candidate concept retrieval via SapBERT initialized from PubMedBERT (e.g., retrieved concepts $c_1$ and $c_2$ for $e_3$), and entity linking through prompting an LLM (e.g., linked concept $c_1$).
  • Figure 2: Sensitivity analysis of AutoPCR w.r.t. $\tau_1$ shows robustness.
  • Figure 3: Sensitivity analysis of AutoPCR w.r.t. $\tau_2$ and $k$ shows robustness.

Theorems & Definitions (4)

  • Definition 1: Concept recognition
  • Definition 2: Entity extraction
  • Definition 3: Candidate concept retrieval
  • Definition 4: Entity linking