Table of Contents
Fetching ...

Tumor-anchored deep feature random forests for out-of-distribution detection in lung cancer segmentation

Aneesh Rangnekar, Harini Veeraraghavan

TL;DR

This work tackles the critical problem of unreliable tumor segmentation under distribution shifts in CT imaging. It introduces RF-Deep, a post-hoc OOD detector that leverages deep features from a SSL-pretrained segmentation backbone, anchored to tumor regions, and aggregated via a random forest trained with outlier exposure. RF-Deep delivers superior near- and far-OOD detection with minimal changes to existing segmentation pipelines, and its performance is robust across pretrained backbones and architectures. The approach is interpretable (via SHAP) and scalable, offering a practical pathway to safer clinical deployment of automated lung tumor segmentation.

Abstract

Accurate segmentation of cancerous lesions from 3D computed tomography (CT) scans is essential for automated treatment planning and response assessment. However, even state-of-the-art models combining self-supervised learning (SSL) pretrained transformers with convolutional decoders are susceptible to out-of-distribution (OOD) inputs, generating confidently incorrect tumor segmentations, posing risks for safe clinical deployment. Existing logit-based methods suffer from task-specific model biases, while architectural enhancements to explicitly detect OOD increase parameters and computational costs. Hence, we introduce a plug-and-play and lightweight post-hoc random forests-based OOD detection framework called RF-Deep that leverages deep features with limited outlier exposure. RF-Deep enhances generalization to imaging variations by repurposing the hierarchical features from the pretrained-then-finetuned backbone encoder, providing task-relevant OOD detection by extracting the features from multiple regions of interest anchored to the predicted tumor segmentations. Hence, it scales to images of varying fields-of-view. We compared RF-Deep against existing OOD detection methods using 1,916 CT scans across near-OOD (pulmonary embolism, negative COVID-19) and far-OOD (kidney cancer, healthy pancreas) datasets. RF-Deep achieved AUROC > 93.50 for the challenging near-OOD datasets and near-perfect detection (AUROC > 99.00) for the far-OOD datasets, substantially outperforming logit-based and radiomics approaches. RF-Deep maintained similar performance consistency across networks of different depths and pretraining strategies, demonstrating its effectiveness as a lightweight, architecture-agnostic approach to enhance the reliability of tumor segmentation from CT volumes.

Tumor-anchored deep feature random forests for out-of-distribution detection in lung cancer segmentation

TL;DR

This work tackles the critical problem of unreliable tumor segmentation under distribution shifts in CT imaging. It introduces RF-Deep, a post-hoc OOD detector that leverages deep features from a SSL-pretrained segmentation backbone, anchored to tumor regions, and aggregated via a random forest trained with outlier exposure. RF-Deep delivers superior near- and far-OOD detection with minimal changes to existing segmentation pipelines, and its performance is robust across pretrained backbones and architectures. The approach is interpretable (via SHAP) and scalable, offering a practical pathway to safer clinical deployment of automated lung tumor segmentation.

Abstract

Accurate segmentation of cancerous lesions from 3D computed tomography (CT) scans is essential for automated treatment planning and response assessment. However, even state-of-the-art models combining self-supervised learning (SSL) pretrained transformers with convolutional decoders are susceptible to out-of-distribution (OOD) inputs, generating confidently incorrect tumor segmentations, posing risks for safe clinical deployment. Existing logit-based methods suffer from task-specific model biases, while architectural enhancements to explicitly detect OOD increase parameters and computational costs. Hence, we introduce a plug-and-play and lightweight post-hoc random forests-based OOD detection framework called RF-Deep that leverages deep features with limited outlier exposure. RF-Deep enhances generalization to imaging variations by repurposing the hierarchical features from the pretrained-then-finetuned backbone encoder, providing task-relevant OOD detection by extracting the features from multiple regions of interest anchored to the predicted tumor segmentations. Hence, it scales to images of varying fields-of-view. We compared RF-Deep against existing OOD detection methods using 1,916 CT scans across near-OOD (pulmonary embolism, negative COVID-19) and far-OOD (kidney cancer, healthy pancreas) datasets. RF-Deep achieved AUROC > 93.50 for the challenging near-OOD datasets and near-perfect detection (AUROC > 99.00) for the far-OOD datasets, substantially outperforming logit-based and radiomics approaches. RF-Deep maintained similar performance consistency across networks of different depths and pretraining strategies, demonstrating its effectiveness as a lightweight, architecture-agnostic approach to enhance the reliability of tumor segmentation from CT volumes.

Paper Structure

This paper contains 29 sections, 2 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: Uncertainty maps for in-distribution (ID) and out-of-distribution (OOD) scans. (a) ID lung cancer cases (1–2) show concentrated boundary uncertainty; OOD cases (3–6: pulmonary embolism, COVID-19-negative, kidney cancer, healthy abdomen) exhibit concentrated, diffused, or misaligned patterns.
  • Figure 2: RF-Deep workflow for scan-level OOD detection. Panels (a–c) depict feature extraction from the frozen segmentation model and random forest training in an outlier exposure manner and panel (d) shows scan-level ID/OOD inference using the trained detector.
  • Figure 3: Pretraining strategies performance and robustness evaluation on ID test set. (a) Segmentation metrics (DSC and HD95). (b) Performance across imaging variations. (c) Representative segmentations; R denotes reconstruction kernel.
  • Figure 4: t-SNE projected embeddings showing dataset-wise separability of (a) deep features and (b) radiomics features. Results from one representative split (of 100), combining all datasets in a single visualization, are shown for brevity. The shaded blue regions indicate the convex hull of the ID dataset.
  • Figure 5: Scan-level OOD detection score distributions for RF-Radiomics (a) and RF-Deep (b) across 100 matched-seed runs.
  • ...and 7 more figures