Towards Generalizable Tumor Synthesis

Qi Chen; Xiaoxi Chen; Haorui Song; Zhiwei Xiong; Alan Yuille; Chen Wei; Zongwei Zhou

Towards Generalizable Tumor Synthesis

Qi Chen, Xiaoxi Chen, Haorui Song, Zhiwei Xiong, Alan Yuille, Chen Wei, Zongwei Zhou

TL;DR

DiffTumor tackles cross-organ tumor synthesis by leveraging the observation that early-stage tumors ($<2$ cm) share CT features across organs, and introduces a three-stage framework: a 3D autoencoder trained on 9,262 unlabeled CT volumes to learn latent representations, a latent-space diffusion model conditioned on a tumor mask $m$ and the healthy latent $z0 extsubscript{healthy}$, and a segmentation model trained on synthetic tumors to augment real-data training. The approach uses large healthy-organ CT datasets to synthesize diverse tumors and evaluate generalization across organs and demographics, achieving notable improvements in Dice similarity coefficient (DSC) and sensitivity for real tumors across hospitals and backbones (e.g., +10.7% DSC across organs, +6.9% DSC and +16.4% sensitivity across demographics), with real-time synthesis at $T=4$. The work includes a Visual Turing Test indicating synthetic tumors approach realism, and ablation results showing substantial benefits from reduced annotations, accelerated synthesis, and enhanced early-tumor detection. Overall, DiffTumor offers a practical, data-efficient path to robust, cross-domain tumor segmentation via synthetic augmentation, potentially reducing annotation costs and improving clinical AI deployment.

Abstract

Tumor synthesis enables the creation of artificial tumors in medical images, facilitating the training of AI models for tumor detection and segmentation. However, success in tumor synthesis hinges on creating visually realistic tumors that are generalizable across multiple organs and, furthermore, the resulting AI models being capable of detecting real tumors in images sourced from different domains (e.g., hospitals). This paper made a progressive stride toward generalizable tumor synthesis by leveraging a critical observation: early-stage tumors (< 2cm) tend to have similar imaging characteristics in computed tomography (CT), whether they originate in the liver, pancreas, or kidneys. We have ascertained that generative AI models, e.g., Diffusion Models, can create realistic tumors generalized to a range of organs even when trained on a limited number of tumor examples from only one organ. Moreover, we have shown that AI models trained on these synthetic tumors can be generalized to detect and segment real tumors from CT volumes, encompassing a broad spectrum of patient demographics, imaging protocols, and healthcare facilities.

Towards Generalizable Tumor Synthesis

TL;DR

DiffTumor tackles cross-organ tumor synthesis by leveraging the observation that early-stage tumors (

cm) share CT features across organs, and introduces a three-stage framework: a 3D autoencoder trained on 9,262 unlabeled CT volumes to learn latent representations, a latent-space diffusion model conditioned on a tumor mask

and the healthy latent

, and a segmentation model trained on synthetic tumors to augment real-data training. The approach uses large healthy-organ CT datasets to synthesize diverse tumors and evaluate generalization across organs and demographics, achieving notable improvements in Dice similarity coefficient (DSC) and sensitivity for real tumors across hospitals and backbones (e.g., +10.7% DSC across organs, +6.9% DSC and +16.4% sensitivity across demographics), with real-time synthesis at

. The work includes a Visual Turing Test indicating synthetic tumors approach realism, and ablation results showing substantial benefits from reduced annotations, accelerated synthesis, and enhanced early-tumor detection. Overall, DiffTumor offers a practical, data-efficient path to robust, cross-domain tumor segmentation via synthetic augmentation, potentially reducing annotation costs and improving clinical AI deployment.

Abstract

Paper Structure (24 sections, 6 equations, 16 figures, 7 tables)

This paper contains 24 sections, 6 equations, 16 figures, 7 tables.

Introduction
Preliminary
DiffTumor
Autoencoder Model
Diffusion Model
Segmentation Model
Experiments & Results
Visual Turing Test
Generalizable to Different Organs
Generalizable to Different Demographics
Advantages of DiffTumor
Related Work
Conclusion
Visual Examples
Description of Radiomics Features
...and 9 more sections

Figures (16)

Figure 1: Generalizable tumor synthesis across organs. Early-stage tumors present similar imaging characteristics in computed tomography (CT), whether they are located in the liver, pancreas, or kidneys. Leveraging this observation, we develop a generative AI model on a few examples of annotated tumors in a specific organ, e.g., the liver (in purple). This AI model (in purple), trained exclusively on liver tumors, can directly create synthetic tumors in those organs where CT volumes of annotated tumors are relatively scarce, e.g., the pancreas (in cyan) and kidneys (in blue and green). By integrating synthetic tumors into extensive CT volumes of healthy organs---routinely collected in clinical settings---we can substantially augment the training set for tumor segmentation. This enhancement can also significantly improve the AI generalizability across CT volumes sourced from diverse hospitals and patient demographics.
Figure 2: Reader studies and feature analysis. We assess the performance of a support vector machine (SVM) classifier, using Radiomics features chu2019utility, and three expert radiologists in identifying the originating organs of cropped tumors. The SVM classifier is tasked with a three-way classification to ascertain whether a tumor originates from the liver, pancreas, or kidneys. In a similar test, radiologists examine the original CT images of these tumors. Reader study results on the left panel indicate significant challenges for both the SVM classifier and the radiologists in accurately identifying the origin of early-stage tumors. The precision and recall scores for both methods closely resemble those of random guessing. Additionally, on the right panel, we present a t-SNE visualization of Radiomics features for tumors from the liver, pancreas, and kidneys. These results highlight the considerable similarity in features and images of early-stage tumors.
Figure 3: Overview of the DiffTumor framework. Towards generalizable tumor synthesis, developing our DiffTumor involves three stages. ① Training an Antoencoder Model---consisting of an encoder and decoder---to learn comprehensive latent features. The learning task here is image reconstruction performed on 9,262 unlabeled three-dimensional CT volumes. Both the trained encoder and decoder will be used in subsequent stages. ② Training a Diffusion Model---a specific type of generative models---using latent features and tumor masks as conditions. Once trained, this model can generate latent features necessary for reconstructing CT volumes with tumors based on arbitrary masks. ③ Training a Segmentation Model using CT volumes of synthetic tumors, which are reconstructed by the decoder. With a large repository of healthy CT volumes, our DiffTumor framework can produce a vast array of synthetic tumors, varying in location, size, shape, texture, and intensity, therefore fostering high-performing AI models for tumor detection/segmentation.
Figure 4: Generalizable to various demographics. Tumor detection and segmentation enhancement for individuals across various age groups and genders. DiffTumor can consistently boost tumor detection and segmentation performance by a significant margin in each patient group. Results of more segmentation backbones (e.g., nnU-Net and Swin UNETR) can be found in Appendix \ref{['sec:generalizable_hospitals_appendix']}.
Figure 5: Reduced annotations for Diffusion Model. Diffusion Model, trained on annotated tumors in Stage ②, can generate synthetic tumors for the subsequent training of Segmentation Model in Stage ③. We investigate the relationship between the number of annotated real tumors required for Diffusion Model and the resultant performance of Segmentation Model. Results with varying numbers of annotated tumors reveal a surprising finding: extensive annotations are not necessary for tumor synthesis, contrary to the experience in computer vision saharia2022paletterombach2022high. Notably, training Diffusion Model with only one annotated tumor seems to be sufficient. This efficiency is connected to our earlier observation in §\ref{['sec:hypothesis']} that tumors, particularly in their early stages, tend to present similar appearances across different organs, thus facilitating the learning process of Diffusion Model with fewer annotated examples.
...and 11 more figures

Towards Generalizable Tumor Synthesis

TL;DR

Abstract

Towards Generalizable Tumor Synthesis

Authors

TL;DR

Abstract

Table of Contents

Figures (16)