Table of Contents
Fetching ...

ELM: Ensemble of Language Models for Predicting Tumor Group from Pathology Reports

Lovedeep Gondara, Jonathan Simkin, Shebnum Devji, Gregory Arbour, Raymond Ng

TL;DR

The paper tackles the substantial manual burden of assigning tumor groups from unstructured pathology reports in population-based cancer registries. It introduces ELM, an ensemble that combines six fine-tuned small language models (three top-part and three bottom-part) with a large language model for arbitration on ambiguous cases. ELM achieves an average precision and recall of 0.94 across 19 tumor groups, outperforming single-model baselines and SLM-only ensembles, and it demonstrates real-world impact by saving hundreds of hours at the BC Cancer Registry. The study shows that a hybrid SLM+LLM pipeline can deliver state-of-the-art tumor-group classification in a PBCR setting and can be adapted to other registries with similar data pipelines.

Abstract

Population-based cancer registries (PBCRs) face a significant bottleneck in manually extracting data from unstructured pathology reports, a process crucial for tasks like tumor group assignment, which can consume 900 person-hours for approximately 100,000 reports. To address this, we introduce ELM (Ensemble of Language Models), a novel ensemble-based approach leveraging both small language models (SLMs) and large language models (LLMs). ELM utilizes six fine-tuned SLMs, where three SLMs use the top part of the pathology report and three SLMs use the bottom part. This is done to maximize report coverage. ELM requires five-out-of-six agreement for a tumor group classification. Disagreements are arbitrated by an LLM with a carefully curated prompt. Our evaluation across nineteen tumor groups demonstrates ELM achieves an average precision and recall of 0.94, outperforming single-model and ensemble-without-LLM approaches. Deployed at the British Columbia Cancer Registry, ELM demonstrates how LLMs can be successfully applied in a PBCR setting to achieve state-of-the-art results and significantly enhance operational efficiencies, saving hundreds of person-hours annually.

ELM: Ensemble of Language Models for Predicting Tumor Group from Pathology Reports

TL;DR

The paper tackles the substantial manual burden of assigning tumor groups from unstructured pathology reports in population-based cancer registries. It introduces ELM, an ensemble that combines six fine-tuned small language models (three top-part and three bottom-part) with a large language model for arbitration on ambiguous cases. ELM achieves an average precision and recall of 0.94 across 19 tumor groups, outperforming single-model baselines and SLM-only ensembles, and it demonstrates real-world impact by saving hundreds of hours at the BC Cancer Registry. The study shows that a hybrid SLM+LLM pipeline can deliver state-of-the-art tumor-group classification in a PBCR setting and can be adapted to other registries with similar data pipelines.

Abstract

Population-based cancer registries (PBCRs) face a significant bottleneck in manually extracting data from unstructured pathology reports, a process crucial for tasks like tumor group assignment, which can consume 900 person-hours for approximately 100,000 reports. To address this, we introduce ELM (Ensemble of Language Models), a novel ensemble-based approach leveraging both small language models (SLMs) and large language models (LLMs). ELM utilizes six fine-tuned SLMs, where three SLMs use the top part of the pathology report and three SLMs use the bottom part. This is done to maximize report coverage. ELM requires five-out-of-six agreement for a tumor group classification. Disagreements are arbitrated by an LLM with a carefully curated prompt. Our evaluation across nineteen tumor groups demonstrates ELM achieves an average precision and recall of 0.94, outperforming single-model and ensemble-without-LLM approaches. Deployed at the British Columbia Cancer Registry, ELM demonstrates how LLMs can be successfully applied in a PBCR setting to achieve state-of-the-art results and significantly enhance operational efficiencies, saving hundreds of person-hours annually.

Paper Structure

This paper contains 17 sections, 1 figure, 5 tables, 1 algorithm.

Figures (1)

  • Figure 1: ELM in action. A pathology report is sent to six small language models for classification. After summing the votes per-class, if the majority vote is less than the threshold (5 in this case) or the predicted tumor group is among the more difficult to classify categories, the report is sent to the LLM with an appropriate prompt directing the LLM to select a tumor groups based on the knowledge of subject matter experts and the classes predicted by the small language models.