Table of Contents
Fetching ...

Domain Adaptation using Silver Standard Labels for Ki-67 Scoring in Digital Pathology: A Step Closer to Widescale Deployment

Amanda Dy, Ngoc-Nhu Jennifer Nguyen, Seyed Hossein Mirjahanmardi, Melanie Dawe, Anthony Fyles, Wei Shi, Fei-Fei Liu, Dimitrios Androutsos, Susan Done, April Khademi

TL;DR

This work tackles the challenge of domain shift in automated Ki-67 proliferation index scoring for digital pathology by introducing an unsupervised domain adaptation pipeline that generates silver-standard (pseudo) labels in the target domain. The method pre-trains two Ki-67 quantification architectures, UV-Net and piNET, on target-domain SS data and then fine-tunes on source-domain gold-standard data (SS+GS), achieving the best target performance and consistency. Across source and target domains, the SS+GS configuration yields higher PI accuracy (mean around $95.9\%$ for piNET) and substantially lower $\Delta PI$ errors (reduced from about $7.5\%$ to near $4\%$), with t-SNE evidence of reduced domain gap. This per-site calibration approach enables closer-to-widescale deployment of automated Ki-67 scoring tools by avoiding manual labeling at every new site while maintaining robust accuracy.

Abstract

Deep learning systems have been proposed to improve the objectivity and efficiency of Ki- 67 PI scoring. The challenge is that while very accurate, deep learning techniques suffer from reduced performance when applied to out-of-domain data. This is a critical challenge for clinical translation, as models are typically trained using data available to the vendor, which is not from the target domain. To address this challenge, this study proposes a domain adaptation pipeline that employs an unsupervised framework to generate silver standard (pseudo) labels in the target domain, which is used to augment the gold standard (GS) source domain data. Five training regimes were tested on two validated Ki-67 scoring architectures (UV-Net and piNET), (1) SS Only: trained on target silver standard (SS) labels, (2) GS Only: trained on source GS labels, (3) Mixed: trained on target SS and source GS labels, (4) GS+SS: trained on source GS labels and fine-tuned on target SS labels, and our proposed method (5) SS+GS: trained on source SS labels and fine-tuned on source GS labels. The SS+GS method yielded significantly (p < 0.05) higher PI accuracy (95.9%) and more consistent results compared to the GS Only model on target data. Analysis of t-SNE plots showed features learned by the SS+GS models are more aligned for source and target data, resulting in improved generalization. The proposed pipeline provides an efficient method for learning the target distribution without manual annotations, which are time-consuming and costly to generate for medical images. This framework can be applied to any target site as a per-laboratory calibration method, for widescale deployment.

Domain Adaptation using Silver Standard Labels for Ki-67 Scoring in Digital Pathology: A Step Closer to Widescale Deployment

TL;DR

This work tackles the challenge of domain shift in automated Ki-67 proliferation index scoring for digital pathology by introducing an unsupervised domain adaptation pipeline that generates silver-standard (pseudo) labels in the target domain. The method pre-trains two Ki-67 quantification architectures, UV-Net and piNET, on target-domain SS data and then fine-tunes on source-domain gold-standard data (SS+GS), achieving the best target performance and consistency. Across source and target domains, the SS+GS configuration yields higher PI accuracy (mean around for piNET) and substantially lower errors (reduced from about to near ), with t-SNE evidence of reduced domain gap. This per-site calibration approach enables closer-to-widescale deployment of automated Ki-67 scoring tools by avoiding manual labeling at every new site while maintaining robust accuracy.

Abstract

Deep learning systems have been proposed to improve the objectivity and efficiency of Ki- 67 PI scoring. The challenge is that while very accurate, deep learning techniques suffer from reduced performance when applied to out-of-domain data. This is a critical challenge for clinical translation, as models are typically trained using data available to the vendor, which is not from the target domain. To address this challenge, this study proposes a domain adaptation pipeline that employs an unsupervised framework to generate silver standard (pseudo) labels in the target domain, which is used to augment the gold standard (GS) source domain data. Five training regimes were tested on two validated Ki-67 scoring architectures (UV-Net and piNET), (1) SS Only: trained on target silver standard (SS) labels, (2) GS Only: trained on source GS labels, (3) Mixed: trained on target SS and source GS labels, (4) GS+SS: trained on source GS labels and fine-tuned on target SS labels, and our proposed method (5) SS+GS: trained on source SS labels and fine-tuned on source GS labels. The SS+GS method yielded significantly (p < 0.05) higher PI accuracy (95.9%) and more consistent results compared to the GS Only model on target data. Analysis of t-SNE plots showed features learned by the SS+GS models are more aligned for source and target data, resulting in improved generalization. The proposed pipeline provides an efficient method for learning the target distribution without manual annotations, which are time-consuming and costly to generate for medical images. This framework can be applied to any target site as a per-laboratory calibration method, for widescale deployment.
Paper Structure (16 sections, 1 equation, 6 figures, 3 tables)

This paper contains 16 sections, 1 equation, 6 figures, 3 tables.

Figures (6)

  • Figure 1: Patient-level $\Delta$PI for piNET and UV-Net.
  • Figure 2: Mean $\Delta$PI across 152 patients. The interval [0 10) contains 72 patients, [10 20) contains 52 patients, [20 30) contains 19 patients, [30 40) contains 8 patients.
  • Figure 3: t-SNE plots with perplexity 15 shown from features extracted from piNET models. Features from source (purple) and target (green) are diffuse in GS Only and GS+SS methods; but similar for the proposed SS+GS method (domain gap is minimized). t-SNE hyperparameters are consistent between visualizations.
  • Figure 4: F1 scores for piNET and UV-Net on source dataset.
  • Figure 5: F1 scores for piNET and UV-Net on the target dataset.
  • ...and 1 more figures