Table of Contents
Fetching ...

A Semi-Automated Annotation Workflow for Paediatric Histopathology Reports Using Small Language Models

Avish Vijayaraghavan, Jaskaran Singh Kawatra, Sebin Sabu, Jonny Sheldon, Will Poulett, Alex Eze, Daniel Key, John Booth, Shiren Patel, Jonny Pearson, Dan Schofield, Jonathan Hope, Pavithra Rajendran, Neil Sebire

Abstract

Electronic Patient Record (EPR) systems contain valuable clinical information, but much of it is trapped in unstructured text, limiting its use for research and decision-making. Large language models can extract such information but require substantial computational resources to run locally, and sending sensitive clinical data to cloud-based services, even when deidentified, raises significant patient privacy concerns. In this study, we develop a resource-efficient semi-automated annotation workflow using small language models (SLMs) to extract structured information from unstructured EPR data, focusing on paediatric histopathology reports. As a proof-of-concept, we apply the workflow to paediatric renal biopsy reports, a domain chosen for its constrained diagnostic scope and well-defined underlying biology. We develop the workflow iteratively with clinical oversight across three meetings, manually annotating 400 reports from a dataset of 2,111 at Great Ormond Street Hospital as a gold standard, while developing an automated information extraction approach using SLMs. We frame extraction as a Question-Answering task grounded by clinician-guided entity guidelines and few-shot examples, evaluating five instruction-tuned SLMs with a disagreement modelling framework to prioritise reports for clinical review. Gemma 2 2B achieves the highest accuracy at 84.3%, outperforming off-the-shelf models including spaCy (74.3%), BioBERT-SQuAD (62.3%), RoBERTa-SQuAD (59.7%), and GLiNER (60.2%). Entity guidelines improved performance by 7-19% over the zero-shot baseline, and few-shot examples by 6-38%, though their benefits do not compound when combined. These results demonstrate that SLMs can extract structured information from specialised clinical domains on CPU-only infrastructure with minimal clinician involvement. Our code is available at https://github.com/gosh-dre/nlp_renal_biopsy.

A Semi-Automated Annotation Workflow for Paediatric Histopathology Reports Using Small Language Models

Abstract

Electronic Patient Record (EPR) systems contain valuable clinical information, but much of it is trapped in unstructured text, limiting its use for research and decision-making. Large language models can extract such information but require substantial computational resources to run locally, and sending sensitive clinical data to cloud-based services, even when deidentified, raises significant patient privacy concerns. In this study, we develop a resource-efficient semi-automated annotation workflow using small language models (SLMs) to extract structured information from unstructured EPR data, focusing on paediatric histopathology reports. As a proof-of-concept, we apply the workflow to paediatric renal biopsy reports, a domain chosen for its constrained diagnostic scope and well-defined underlying biology. We develop the workflow iteratively with clinical oversight across three meetings, manually annotating 400 reports from a dataset of 2,111 at Great Ormond Street Hospital as a gold standard, while developing an automated information extraction approach using SLMs. We frame extraction as a Question-Answering task grounded by clinician-guided entity guidelines and few-shot examples, evaluating five instruction-tuned SLMs with a disagreement modelling framework to prioritise reports for clinical review. Gemma 2 2B achieves the highest accuracy at 84.3%, outperforming off-the-shelf models including spaCy (74.3%), BioBERT-SQuAD (62.3%), RoBERTa-SQuAD (59.7%), and GLiNER (60.2%). Entity guidelines improved performance by 7-19% over the zero-shot baseline, and few-shot examples by 6-38%, though their benefits do not compound when combined. These results demonstrate that SLMs can extract structured information from specialised clinical domains on CPU-only infrastructure with minimal clinician involvement. Our code is available at https://github.com/gosh-dre/nlp_renal_biopsy.

Paper Structure

This paper contains 40 sections, 16 figures, 5 tables.

Figures (16)

  • Figure 1: The general histopathology workflow. Our focus is on extracting key text spans from the clinician's report that are relevant for diagnosis.
  • Figure 1: Streamlit app used for QA annotation of reports, displaying three of the nine total entities: cortex presence, medulla presence, and number of total glomeruli. Sensitive patient information in the report has been redacted.
  • Figure 2: Annotation workflow for medical reports which three expert touchpoints highlighted with thick blue lines. The second and third touchpoints have potential to be indefinitely long and so we have specified the assumption of minimal expert time.
  • Figure 2: Example report in Streamlit comparison app showing the predictions of two models for cortex presence and medulla presence entities along with the comment box for qualitative error analysis.
  • Figure 3: Example entity extraction prompt for two entities.
  • ...and 11 more figures