Table of Contents
Fetching ...

Optimizing Lymphocyte Detection in Breast Cancer Whole Slide Imaging through Data-Centric Strategies

Amine Marzouki, Zhuxian Guo, Qinghe Zeng, Camille Kurtz, Nicolas Loménie

TL;DR

The paper addresses the challenge of quantifying lymphocytes in breast cancer histology, a task central to TIL-based biomarkers, by testing a data-centric optimization pipeline on an off-the-shelf YOLOv5 detector. By focusing on dataset curation and novel upsampling plus visual-coherence transformations, the authors achieve strong lymphocyte detection without architectural changes, demonstrated on the TIGER breast cancer dataset where FROC improved to 0.573 and the model rose from 199th to 6th place. The approach also generalizes to colorectal cancer HES slides, suggesting clinical applicability without immunohistochemistry. This work highlights the critical role of data preparation in computational pathology, offering a scalable path to reproducible lymphocyte biomarkers using standard models.

Abstract

Efficient and precise quantification of lymphocytes in histopathology slides is imperative for the characterization of the tumor microenvironment and immunotherapy response insights. We developed a data-centric optimization pipeline that attain great lymphocyte detection performance using an off-the-shelf YOLOv5 model, without any architectural modifications. Our contribution that rely on strategic dataset augmentation strategies, includes novel biological upsampling and custom visual cohesion transformations tailored to the unique properties of tissue imagery, and enables to dramatically improve model performances. Our optimization reveals a pivotal realization: given intensive customization, standard computational pathology models can achieve high-capability biomarker development, without increasing the architectural complexity. We showcase the interest of this approach in the context of breast cancer where our strategies lead to good lymphocyte detection performances, echoing a broadly impactful paradigm shift. Furthermore, our data curation techniques enable crucial histological analysis benchmarks, highlighting improved generalizable potential.

Optimizing Lymphocyte Detection in Breast Cancer Whole Slide Imaging through Data-Centric Strategies

TL;DR

The paper addresses the challenge of quantifying lymphocytes in breast cancer histology, a task central to TIL-based biomarkers, by testing a data-centric optimization pipeline on an off-the-shelf YOLOv5 detector. By focusing on dataset curation and novel upsampling plus visual-coherence transformations, the authors achieve strong lymphocyte detection without architectural changes, demonstrated on the TIGER breast cancer dataset where FROC improved to 0.573 and the model rose from 199th to 6th place. The approach also generalizes to colorectal cancer HES slides, suggesting clinical applicability without immunohistochemistry. This work highlights the critical role of data preparation in computational pathology, offering a scalable path to reproducible lymphocyte biomarkers using standard models.

Abstract

Efficient and precise quantification of lymphocytes in histopathology slides is imperative for the characterization of the tumor microenvironment and immunotherapy response insights. We developed a data-centric optimization pipeline that attain great lymphocyte detection performance using an off-the-shelf YOLOv5 model, without any architectural modifications. Our contribution that rely on strategic dataset augmentation strategies, includes novel biological upsampling and custom visual cohesion transformations tailored to the unique properties of tissue imagery, and enables to dramatically improve model performances. Our optimization reveals a pivotal realization: given intensive customization, standard computational pathology models can achieve high-capability biomarker development, without increasing the architectural complexity. We showcase the interest of this approach in the context of breast cancer where our strategies lead to good lymphocyte detection performances, echoing a broadly impactful paradigm shift. Furthermore, our data curation techniques enable crucial histological analysis benchmarks, highlighting improved generalizable potential.
Paper Structure (8 sections, 4 figures, 1 table)

This paper contains 8 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: ROI examples from the TIGER challenge dataset TigerChallenge. (C1) tissue borders and lymphocyte bounding boxes / labels. (D1) combination of the tissue compartment masks and bounding boxes.
  • Figure 2: Histogram of the patch sizes. The majority of patches were very small, requiring intelligent upsampling to generate natural 256$\times{}$256 segments. Only a small subset of larger images were tiled into segments retaining full lymphocytes. Tailored upsampling strategies were critical due to the imbalance nature of the dataset.
  • Figure 3: Preprocessing pipeline (the patch size denotes the maximum of patch height and width dimensions). Small patches were upsampled into natural 256$\times{}$256 images via mirroring, cropping, lymphocyte transplanting and minor augmentations.
  • Figure 4: Validation results on TIGER data and generalization on colorectal cancer slides. (a) shows local validation set patches with ground truth lymphocyte annotations. (b) displays predictions from our model on the same patches. (c) illustrates the TIGER-trained model's original inferences on proprietary colorectal HES slides, demonstrating its potential to be adapted to replace costly IHC methods.