Random forest-based out-of-distribution detection for robust lung cancer segmentation
Aneesh Rangnekar, Harini Veeraraghavan
TL;DR
The paper tackles the problem of transformer-based CT lung cancer segmentation losing accuracy on out-of-distribution data. It introduces RF-Deep, a random forest classifier that leverages deep features from a SimMIM-pretrained Swin Transformer encoder used for segmentation to detect OOD scans. Across five public datasets, RF-Deep outperforms standard OOD methods (e.g., MaxSoftmax, MaxLogits, energy, entropy) and a radiomics-based benchmark, achieving high AUROC and very low FPR95, especially in near- and far-OOD scenarios. The approach is lightweight and provides interpretability through SHAP and visualization analyses, with potential for broader clinical deployment and extension to other disease sites.
Abstract
Accurate detection and segmentation of cancerous lesions from computed tomography (CT) scans is essential for automated treatment planning and cancer treatment response assessment. Transformer-based models with self-supervised pretraining can produce reliably accurate segmentation from in-distribution (ID) data but degrade when applied to out-of-distribution (OOD) datasets. We address this challenge with RF-Deep, a random forest classifier that utilizes deep features from a pretrained transformer encoder of the segmentation model to detect OOD scans and enhance segmentation reliability. The segmentation model comprises a Swin Transformer encoder, pretrained with masked image modeling (SimMIM) on 10,432 unlabeled 3D CT scans covering cancerous and non-cancerous conditions, with a convolution decoder, trained to segment lung cancers in 317 3D scans. Independent testing was performed on 603 3D CT public datasets that included one ID dataset and four OOD datasets comprising chest CTs with pulmonary embolism (PE) and COVID-19, and abdominal CTs with kidney cancers and healthy volunteers. RF-Deep detected OOD cases with a FPR95 of 18.26%, 27.66%, and less than 0.1% on PE, COVID-19, and abdominal CTs, consistently outperforming established OOD approaches. The RF-Deep classifier provides a simple and effective approach to enhance reliability of cancer segmentation in ID and OOD scenarios.
