Table of Contents
Fetching ...

FreeTumor: Advance Tumor Segmentation via Large-Scale Tumor Synthesis

Linshan Wu, Jiaxin Zhuang, Xuefeng Ni, Hao Chen

TL;DR

FreeTumor tackles the annotation bottleneck in tumor segmentation by proposing an annotation-free tumor synthesis pipeline that scales to large unlabeled CT data. It leverages adversarial training to synthesize high-quality tumors and an online, segmentation-based discriminator to filter low-quality samples, enabling effective training with real and synthetic data. Across LiTS, MSD, KiTS benchmarks and the FLARE23 leaderboard, FreeTumor delivers substantial gains over real-data baselines and prior synthesis methods, with an 11k-data regime achieving the strongest performance. The approach demonstrates a practical data-scaling law for tumor segmentation and offers a release-ready synthetic tumor dataset to spur further research.

Abstract

AI-driven tumor analysis has garnered increasing attention in healthcare. However, its progress is significantly hindered by the lack of annotated tumor cases, which requires radiologists to invest a lot of effort in collecting and annotation. In this paper, we introduce a highly practical solution for robust tumor synthesis and segmentation, termed FreeTumor, which refers to annotation-free synthetic tumors and our desire to free patients that suffering from tumors. Instead of pursuing sophisticated technical synthesis modules, we aim to design a simple yet effective tumor synthesis paradigm to unleash the power of large-scale data. Specifically, FreeTumor advances existing methods mainly from three aspects: (1) Existing methods only leverage small-scale labeled data for synthesis training, which limits their ability to generalize well on unseen data from different sources. To this end, we introduce the adversarial training strategy to leverage large-scale and diversified unlabeled data in synthesis training, significantly improving tumor synthesis. (2) Existing methods largely ignored the negative impact of low-quality synthetic tumors in segmentation training. Thus, we employ an adversarial-based discriminator to automatically filter out the low-quality synthetic tumors, which effectively alleviates their negative impact. (3) Existing methods only used hundreds of cases in tumor segmentation. In FreeTumor, we investigate the data scaling law in tumor segmentation by scaling up the dataset to 11k cases. Extensive experiments demonstrate the superiority of FreeTumor, e.g., on three tumor segmentation benchmarks, average $+8.9\%$ DSC over the baseline that only using real tumors and $+6.6\%$ DSC over the state-of-the-art tumor synthesis method. Code will be available.

FreeTumor: Advance Tumor Segmentation via Large-Scale Tumor Synthesis

TL;DR

FreeTumor tackles the annotation bottleneck in tumor segmentation by proposing an annotation-free tumor synthesis pipeline that scales to large unlabeled CT data. It leverages adversarial training to synthesize high-quality tumors and an online, segmentation-based discriminator to filter low-quality samples, enabling effective training with real and synthetic data. Across LiTS, MSD, KiTS benchmarks and the FLARE23 leaderboard, FreeTumor delivers substantial gains over real-data baselines and prior synthesis methods, with an 11k-data regime achieving the strongest performance. The approach demonstrates a practical data-scaling law for tumor segmentation and offers a release-ready synthetic tumor dataset to spur further research.

Abstract

AI-driven tumor analysis has garnered increasing attention in healthcare. However, its progress is significantly hindered by the lack of annotated tumor cases, which requires radiologists to invest a lot of effort in collecting and annotation. In this paper, we introduce a highly practical solution for robust tumor synthesis and segmentation, termed FreeTumor, which refers to annotation-free synthetic tumors and our desire to free patients that suffering from tumors. Instead of pursuing sophisticated technical synthesis modules, we aim to design a simple yet effective tumor synthesis paradigm to unleash the power of large-scale data. Specifically, FreeTumor advances existing methods mainly from three aspects: (1) Existing methods only leverage small-scale labeled data for synthesis training, which limits their ability to generalize well on unseen data from different sources. To this end, we introduce the adversarial training strategy to leverage large-scale and diversified unlabeled data in synthesis training, significantly improving tumor synthesis. (2) Existing methods largely ignored the negative impact of low-quality synthetic tumors in segmentation training. Thus, we employ an adversarial-based discriminator to automatically filter out the low-quality synthetic tumors, which effectively alleviates their negative impact. (3) Existing methods only used hundreds of cases in tumor segmentation. In FreeTumor, we investigate the data scaling law in tumor segmentation by scaling up the dataset to 11k cases. Extensive experiments demonstrate the superiority of FreeTumor, e.g., on three tumor segmentation benchmarks, average DSC over the baseline that only using real tumors and DSC over the state-of-the-art tumor synthesis method. Code will be available.
Paper Structure (19 sections, 8 equations, 10 figures, 15 tables)

This paper contains 19 sections, 8 equations, 10 figures, 15 tables.

Figures (10)

  • Figure 1: Synthesize different types of tumors with AI. The green and red arrows point to the real and synthetic tumors, respectively. We collect 0.9k labeled and 10k unlabeled data to facilitate tumor segmentation. By unleashing the power of large-scale data, FreeTumor outperforms the previous methods DiffTumor Difftumor, SynTumor Syntumor, and nnUNet nnunet by a significant margin.
  • Figure 2: The overall framework of FreeTumor, including three stages: (1)Training a segmentor as discriminator. We first leverage the labeled data to train a baseline segmentor as the discriminator in the following generative model. Sup. denotes supervision. (2)Synthesizing tumors with adversarial training. The tumor position simulation follows SyntumorDifftumor. We leverage both labeled and unlabeled data to train a tumor synthesis model. Thanks to the previous efforts abdomenatlasabdomenct1kMSDFLARE22 in collecting data, we can formulate an 11k dataset with organ labels for tumor position simulation. Motivated by oasis, we use the baseline segmentor to discriminate the reality of synthetic tumors. (3)Training tumor segmentation model. We generate and filter the synthetic tumors on the large-scale data for training the final tumor segmentor.
  • Figure 3: Segmentation-based Filtering strategy for synthetic tumors. (a) We discard the unsatisfactory synthetic tumors according to Eq. (\ref{['eqn_turing']}). (b) We use the baseline segmentor $S$ to test the accuracy of synthetic tumors, verifying with the segmentation DSC. It can be seen that with our proposed filtering strategy, the DSC of synthetic tumors are improved significantly (average $+16.1\%$), which also surpass SynTumor Syntumor and DiffTumor Difftumor by a large margin.
  • Figure 4: Average DSC when scaling up data.
  • Figure 5: Data scaling law in tumor segmenation. The result of DiffTumor Difftumor on 11k is re-implement.
  • ...and 5 more figures