Semi-Supervised Learning from Small Annotated Data and Large Unlabeled Data for Fine-grained PICO Entity Recognition
Fangyi Chen, Gongbo Zhang, Yilu Fang, Yifan Peng, Chunhua Weng
TL;DR
This work tackles the problem of extracting fine-grained PICO entities from randomized controlled trial texts, where labeled data are scarce and annotation guidelines vary. It proposes FinePICO, a semi-supervised learning framework that starts from a small labeled set, generates high-confidence pseudo-labels for large unlabeled corpora, and iteratively refines a BiomedBERT-based NER model while applying multiple quality-enhancement strategies to curb error propagation. The approach achieves substantial improvements over fully supervised baselines under limited annotation (e.g., ~16% relative F1 gain) and demonstrates strong generalizability to revised PICO schemes and external datasets, with statistically significant results (p-values < 0.05–0.001 in various settings). By leveraging cross-domain unlabeled data and robust pseudo-label filtering, FinePICO offers a practical pathway to scalable, fine-grained PICO extraction for evidence synthesis and meta-analysis, reducing annotation burden while preserving performance. The work also highlights remaining challenges in boundary detection and arm-level value disambiguation, suggesting avenues for future refinements and contextual modeling across sentence boundaries.
Abstract
Objective: Extracting PICO elements -- Participants, Intervention, Comparison, and Outcomes -- from clinical trial literature is essential for clinical evidence retrieval, appraisal, and synthesis. Existing approaches do not distinguish the attributes of PICO entities. This study aims to develop a named entity recognition (NER) model to extract PICO entities with fine granularities. Materials and Methods: Using a corpus of 2,511 abstracts with PICO mentions from 4 public datasets, we developed a semi-supervised method to facilitate the training of a NER model, FinePICO, by combining limited annotated data of PICO entities and abundant unlabeled data. For evaluation, we divided the entire dataset into two subsets: a smaller group with annotations and a larger group without annotations. We then established the theoretical lower and upper performance bounds based on the performance of supervised learning models trained solely on the small, annotated subset and on the entire set with complete annotations, respectively. Finally, we evaluated FinePICO on both the smaller annotated subset and the larger, initially unannotated subset. We measured the performance of FinePICO using precision, recall, and F1. Results: Our method achieved precision/recall/F1 of 0.567/0.636/0.60, respectively, using a small set of annotated samples, outperforming the baseline model (F1: 0.437) by more than 16\%. The model demonstrates generalizability to a different PICO framework and to another corpus, which consistently outperforms the benchmark in diverse experimental settings (p-value \textless0.001). Conclusion: This study contributes a generalizable and effective semi-supervised approach to named entity recognition leveraging large unlabeled data together with small, annotated data. It also initially supports fine-grained PICO extraction.
