Table of Contents
Fetching ...

SMILE: a Scale-aware Multiple Instance Learning Method for Multicenter STAS Lung Cancer Histopathology Diagnosis

Liangrui Pan, Xiaoyu Li, Yutao Dou, Qiya Song, Jiadi Luo, Qingchun Liang, Shaoliang Peng

TL;DR

Spread Through Air Spaces (STAS) is a newly described and prognostically significant invasion pattern in lung cancer that currently relies on subjective, time-consuming pathology. This paper introduces SMILE, a scale-aware multiple instance learning framework that uses a scale-adaptive attention mechanism to robustly detect sparse, heterogeneous STAS features in whole-slide images. It constructs three multicenter STAS datasets (STAS_CSU, STAS_TCGA, STAS_CPTAC) totaling 2,970 WSIs and benchmarks 11 MIL baselines, with SMILE delivering competitive accuracy and AUC, especially on STAS_CPTAC. The study provides public datasets and a strong baseline for STAS research, highlighting scale-aware feature aggregation as a key factor for reliable computational pathology in multicenter settings.

Abstract

Spread through air spaces (STAS) represents a newly identified aggressive pattern in lung cancer, which is known to be associated with adverse prognostic factors and complex pathological features. Pathologists currently rely on time consuming manual assessments, which are highly subjective and prone to variation. This highlights the urgent need for automated and precise diag nostic solutions. 2,970 lung cancer tissue slides are comprised from multiple centers, re-diagnosed them, and constructed and publicly released three lung cancer STAS datasets: STAS CSU (hospital), STAS TCGA, and STAS CPTAC. All STAS datasets provide corresponding pathological feature diagnoses and related clinical data. To address the bias, sparse and heterogeneous nature of STAS, we propose an scale-aware multiple instance learning(SMILE) method for STAS diagnosis of lung cancer. By introducing a scale-adaptive attention mechanism, the SMILE can adaptively adjust high attention instances, reducing over-reliance on local regions and promoting consistent detection of STAS lesions. Extensive experiments show that SMILE achieved competitive diagnostic results on STAS CSU, diagnosing 251 and 319 STAS samples in CPTAC andTCGA,respectively, surpassing clinical average AUC. The 11 open baseline results are the first to be established for STAS research, laying the foundation for the future expansion, interpretability, and clinical integration of computational pathology technologies. The datasets and code are available at https://anonymous.4open.science/r/IJCAI25-1DA1.

SMILE: a Scale-aware Multiple Instance Learning Method for Multicenter STAS Lung Cancer Histopathology Diagnosis

TL;DR

Spread Through Air Spaces (STAS) is a newly described and prognostically significant invasion pattern in lung cancer that currently relies on subjective, time-consuming pathology. This paper introduces SMILE, a scale-aware multiple instance learning framework that uses a scale-adaptive attention mechanism to robustly detect sparse, heterogeneous STAS features in whole-slide images. It constructs three multicenter STAS datasets (STAS_CSU, STAS_TCGA, STAS_CPTAC) totaling 2,970 WSIs and benchmarks 11 MIL baselines, with SMILE delivering competitive accuracy and AUC, especially on STAS_CPTAC. The study provides public datasets and a strong baseline for STAS research, highlighting scale-aware feature aggregation as a key factor for reliable computational pathology in multicenter settings.

Abstract

Spread through air spaces (STAS) represents a newly identified aggressive pattern in lung cancer, which is known to be associated with adverse prognostic factors and complex pathological features. Pathologists currently rely on time consuming manual assessments, which are highly subjective and prone to variation. This highlights the urgent need for automated and precise diag nostic solutions. 2,970 lung cancer tissue slides are comprised from multiple centers, re-diagnosed them, and constructed and publicly released three lung cancer STAS datasets: STAS CSU (hospital), STAS TCGA, and STAS CPTAC. All STAS datasets provide corresponding pathological feature diagnoses and related clinical data. To address the bias, sparse and heterogeneous nature of STAS, we propose an scale-aware multiple instance learning(SMILE) method for STAS diagnosis of lung cancer. By introducing a scale-adaptive attention mechanism, the SMILE can adaptively adjust high attention instances, reducing over-reliance on local regions and promoting consistent detection of STAS lesions. Extensive experiments show that SMILE achieved competitive diagnostic results on STAS CSU, diagnosing 251 and 319 STAS samples in CPTAC andTCGA,respectively, surpassing clinical average AUC. The 11 open baseline results are the first to be established for STAS research, laying the foundation for the future expansion, interpretability, and clinical integration of computational pathology technologies. The datasets and code are available at https://anonymous.4open.science/r/IJCAI25-1DA1.

Paper Structure

This paper contains 20 sections, 15 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Three common pathological features of STAS in lung cancer histopathology images. STAS is mainly distributed outside the main tumor body in the form of solid cell nests, micropapillary clusters, and single cancer cells.
  • Figure 2: The process of constructing the three STAS datasets.
  • Figure 3: (a) Overall workflow of the proposed SMILE approach. (b) The process of feature preprocessing. We process the given bag through a joint feature representation module to transform them into instance features. These features are then processed through a scale-adaptive attention module to obtain scaled bag-level feature representations. Finally, the final STAS prediction results are obtained through the classifier $g$.