Table of Contents
Fetching ...

R$^{2}$Seg: Training-Free OOD Medical Tumor Segmentation via Anatomical Reasoning and Statistical Rejection

Shuaike Shen, Ke Liu, Jiaqing Xie, Shangde Gao, Chunhua Shen, Ge Liu, Mireia Crispin-Ortuzar, Shangqi Gao

TL;DR

R$^2$-Seg addresses the challenge of out-of-distribution tumor segmentation by combining a training-free, anatomy-aware Reason step with a statistically principled Reject step. An LLM-guided planner localizes organ anchors and generates multi-scale ROIs, constraining a frozen segmentation model to anatomically plausible regions. A nonparametric two-sample test with FDR control (via $MMD^2$ and permutation testing) filters out false positives, significantly improving specificity while preserving sensitivity. The approach avoids catastrophic forgetting by not updating model weights, and achieves strong OOD performance across multi-center, multi-modal benchmarks, enabling safer, deployment-friendly tumor parsing.

Abstract

Foundation models for medical image segmentation struggle under out-of-distribution (OOD) shifts, often producing fragmented false positives on OOD tumors. We introduce R$^{2}$Seg, a training-free framework for robust OOD tumor segmentation that operates via a two-stage Reason-and-Reject process. First, the Reason step employs an LLM-guided anatomical reasoning planner to localize organ anchors and generate multi-scale ROIs. Second, the Reject step applies two-sample statistical testing to candidates generated by a frozen foundation model (BiomedParse) within these ROIs. This statistical rejection filter retains only candidates significantly different from normal tissue, effectively suppressing false positives. Our framework requires no parameter updates, making it compatible with zero-update test-time augmentation and avoiding catastrophic forgetting. On multi-center and multi-modal tumor segmentation benchmarks, R$^{2}$Seg substantially improves Dice, specificity, and sensitivity over strong baselines and the original foundation models. Code are available at https://github.com/Eurekashen/R2Seg.

R$^{2}$Seg: Training-Free OOD Medical Tumor Segmentation via Anatomical Reasoning and Statistical Rejection

TL;DR

R-Seg addresses the challenge of out-of-distribution tumor segmentation by combining a training-free, anatomy-aware Reason step with a statistically principled Reject step. An LLM-guided planner localizes organ anchors and generates multi-scale ROIs, constraining a frozen segmentation model to anatomically plausible regions. A nonparametric two-sample test with FDR control (via and permutation testing) filters out false positives, significantly improving specificity while preserving sensitivity. The approach avoids catastrophic forgetting by not updating model weights, and achieves strong OOD performance across multi-center, multi-modal benchmarks, enabling safer, deployment-friendly tumor parsing.

Abstract

Foundation models for medical image segmentation struggle under out-of-distribution (OOD) shifts, often producing fragmented false positives on OOD tumors. We introduce RSeg, a training-free framework for robust OOD tumor segmentation that operates via a two-stage Reason-and-Reject process. First, the Reason step employs an LLM-guided anatomical reasoning planner to localize organ anchors and generate multi-scale ROIs. Second, the Reject step applies two-sample statistical testing to candidates generated by a frozen foundation model (BiomedParse) within these ROIs. This statistical rejection filter retains only candidates significantly different from normal tissue, effectively suppressing false positives. Our framework requires no parameter updates, making it compatible with zero-update test-time augmentation and avoiding catastrophic forgetting. On multi-center and multi-modal tumor segmentation benchmarks, RSeg substantially improves Dice, specificity, and sensitivity over strong baselines and the original foundation models. Code are available at https://github.com/Eurekashen/R2Seg.

Paper Structure

This paper contains 61 sections, 18 equations, 7 figures, 5 tables, 3 algorithms.

Figures (7)

  • Figure 1: Illustration of visual embedding distributions. Left: In-Distribution, Right: Out-of-Distribution.
  • Figure 2: Overview of R$^2$-Seg pipeline. Top row: LLM-based segmentation planning and ROI construction; middle row: BioMedParse-based tumor segmentation and candidate extraction; bottom row: Statistical two-sample test and false discovery rate control.
  • Figure 3: Visualization of segmentation results for both in-distribution and out-of-distribution tumor types.
  • Figure 4: Evaluation of over-segmentation on slices without tumors. Here, $p$-value is annotated by an asterisk, i.e., ns: $0.05 < p \le 1$, *: $0.01 < p \le 0.05$, **: $0.001 < p \le 0.01$, ***: $0.0001 < p \le 0.001$, and ****: $p \le 0.0001$.
  • Figure 5: Evaluation of knowledge forgetting on in-distribution CT slices. Statistical tests show that the segmentation performance of fine-tuned models drops significantly across all organs.
  • ...and 2 more figures