Table of Contents
Fetching ...

A Span-based Model for Extracting Overlapping PICO Entities from RCT Publications

Gongbo Zhang, Yiliang Zhou, Yan Hu, Hua Xu, Chunhua Weng, Yifan Peng

TL;DR

PICOX addresses the challenging problem of extracting overlapping PICO entities from RCT publications. It introduces a two-step span-based approach: (i) span localization to identify start/end/inside/outside/both-start-and-end cues for potential spans, and (ii) span classification with a multi-label predictor to assign PICO types to each valid span; a data augmentation scheme with composite spans further reduces false positives. Evaluated on four datasets (EBM-NLP, PICO-Corpus, AD, COVID-19), PICOX yields higher micro F1 scores and improved handling of overlaps, with statistically significant gains on several metrics (e.g., $45.05\rightarrow50.87$ on EBM-NLP, $77.10\rightarrow80.32$ on COVID-19, and $28.15\rightarrow32.97$ for overlapped entities). The results demonstrate robust, scalable PICO extraction without fixed-span width limitations, and the authors discuss limitations and future directions, including joint training and larger models to further boost precision and recall in challenging spans.

Abstract

Objectives Extraction of PICO (Populations, Interventions, Comparison, and Outcomes) entities is fundamental to evidence retrieval. We present a novel method PICOX to extract overlapping PICO entities. Materials and Methods PICOX first identifies entities by assessing whether a word marks the beginning or conclusion of an entity. Then it uses a multi-label classifier to assign one or more PICO labels to a span candidate. PICOX was evaluated using one of the best-performing baselines, EBM-NLP, and three more datasets, i.e., PICO-Corpus, and RCT publications on Alzheimer's Disease or COVID-19, using entity-level precision, recall, and F1 scores. Results PICOX achieved superior precision, recall, and F1 scores across the board, with the micro F1 score improving from 45.05 to 50.87 (p << 0.01). On the PICO-Corpus, PICOX obtained higher recall and F1 scores than the baseline and improved the micro recall score from 56.66 to 67.33. On the COVID-19 dataset, PICOX also outperformed the baseline and improved the micro F1 score from 77.10 to 80.32. On the AD dataset, PICOX demonstrated comparable F1 scores with higher precision when compared to the baseline. Conclusion PICOX excels in identifying overlapping entities and consistently surpasses a leading baseline across multiple datasets. Ablation studies reveal that its data augmentation strategy effectively minimizes false positives and improves precision.

A Span-based Model for Extracting Overlapping PICO Entities from RCT Publications

TL;DR

PICOX addresses the challenging problem of extracting overlapping PICO entities from RCT publications. It introduces a two-step span-based approach: (i) span localization to identify start/end/inside/outside/both-start-and-end cues for potential spans, and (ii) span classification with a multi-label predictor to assign PICO types to each valid span; a data augmentation scheme with composite spans further reduces false positives. Evaluated on four datasets (EBM-NLP, PICO-Corpus, AD, COVID-19), PICOX yields higher micro F1 scores and improved handling of overlaps, with statistically significant gains on several metrics (e.g., on EBM-NLP, on COVID-19, and for overlapped entities). The results demonstrate robust, scalable PICO extraction without fixed-span width limitations, and the authors discuss limitations and future directions, including joint training and larger models to further boost precision and recall in challenging spans.

Abstract

Objectives Extraction of PICO (Populations, Interventions, Comparison, and Outcomes) entities is fundamental to evidence retrieval. We present a novel method PICOX to extract overlapping PICO entities. Materials and Methods PICOX first identifies entities by assessing whether a word marks the beginning or conclusion of an entity. Then it uses a multi-label classifier to assign one or more PICO labels to a span candidate. PICOX was evaluated using one of the best-performing baselines, EBM-NLP, and three more datasets, i.e., PICO-Corpus, and RCT publications on Alzheimer's Disease or COVID-19, using entity-level precision, recall, and F1 scores. Results PICOX achieved superior precision, recall, and F1 scores across the board, with the micro F1 score improving from 45.05 to 50.87 (p << 0.01). On the PICO-Corpus, PICOX obtained higher recall and F1 scores than the baseline and improved the micro recall score from 56.66 to 67.33. On the COVID-19 dataset, PICOX also outperformed the baseline and improved the micro F1 score from 77.10 to 80.32. On the AD dataset, PICOX demonstrated comparable F1 scores with higher precision when compared to the baseline. Conclusion PICOX excels in identifying overlapping entities and consistently surpasses a leading baseline across multiple datasets. Ablation studies reveal that its data augmentation strategy effectively minimizes false positives and improves precision.
Paper Structure (19 sections, 4 equations, 4 figures, 6 tables, 1 algorithm)

This paper contains 19 sections, 4 equations, 4 figures, 6 tables, 1 algorithm.

Figures (4)

  • Figure 1: An example sentence that contains overlapping PICO entities. The population entity is contained within the outcome entity.
  • Figure 2: The workflow for PICOX consists of two steps. Initially, it detects the start and end positions in the text sequence. Subsequently, it categorizes each span bounded by a pair of start and end positions. In the provided example from PMID 20840173, PICOX discerned two potential start positions and two end positions, resulting in four valid pairs where the start position does not surpass the end position. Post span classification, PICOX identified a population entity "A total of 403 adult ... to bupropion use" and an intervention entity "bupropion SR". In this instance, the Intervention entity is encapsulated within the Population entity.
  • Figure 3: Comparison of two different implementations, one incorporates the data augmentation strategy whereas the other does not. The performance was evaluated on EBM-NLP. P - Precision. R - Recall. F1 - F1 score.
  • Figure 4: Examples of PICO extraction by PICOX and the baseline.