Table of Contents
Fetching ...

Analyzing Tumors by Synthesis

Qi Chen, Yuxiang Lai, Xiaoxi Chen, Qixin Hu, Alan Yuille, Zongwei Zhou

TL;DR

Case studies in the liver, pancreas, and kidneys reveal that AI trained on synthetic tumors can achieve performance comparable to, or better than, AI only trained on real data.

Abstract

Computer-aided tumor detection has shown great potential in enhancing the interpretation of over 80 million CT scans performed annually in the United States. However, challenges arise due to the rarity of CT scans with tumors, especially early-stage tumors. Developing AI with real tumor data faces issues of scarcity, annotation difficulty, and low prevalence. Tumor synthesis addresses these challenges by generating numerous tumor examples in medical images, aiding AI training for tumor detection and segmentation. Successful synthesis requires realistic and generalizable synthetic tumors across various organs. This chapter reviews AI development on real and synthetic data and summarizes two key trends in synthetic data for cancer imaging research: modeling-based and learning-based approaches. Modeling-based methods, like Pixel2Cancer, simulate tumor development over time using generic rules, while learning-based methods, like DiffTumor, learn from a few annotated examples in one organ to generate synthetic tumors in others. Reader studies with expert radiologists show that synthetic tumors can be convincingly realistic. We also present case studies in the liver, pancreas, and kidneys reveal that AI trained on synthetic tumors can achieve performance comparable to, or better than, AI only trained on real data. Tumor synthesis holds significant promise for expanding datasets, enhancing AI reliability, improving tumor detection performance, and preserving patient privacy.

Analyzing Tumors by Synthesis

TL;DR

Case studies in the liver, pancreas, and kidneys reveal that AI trained on synthetic tumors can achieve performance comparable to, or better than, AI only trained on real data.

Abstract

Computer-aided tumor detection has shown great potential in enhancing the interpretation of over 80 million CT scans performed annually in the United States. However, challenges arise due to the rarity of CT scans with tumors, especially early-stage tumors. Developing AI with real tumor data faces issues of scarcity, annotation difficulty, and low prevalence. Tumor synthesis addresses these challenges by generating numerous tumor examples in medical images, aiding AI training for tumor detection and segmentation. Successful synthesis requires realistic and generalizable synthetic tumors across various organs. This chapter reviews AI development on real and synthetic data and summarizes two key trends in synthetic data for cancer imaging research: modeling-based and learning-based approaches. Modeling-based methods, like Pixel2Cancer, simulate tumor development over time using generic rules, while learning-based methods, like DiffTumor, learn from a few annotated examples in one organ to generate synthetic tumors in others. Reader studies with expert radiologists show that synthetic tumors can be convincingly realistic. We also present case studies in the liver, pancreas, and kidneys reveal that AI trained on synthetic tumors can achieve performance comparable to, or better than, AI only trained on real data. Tumor synthesis holds significant promise for expanding datasets, enhancing AI reliability, improving tumor detection performance, and preserving patient privacy.
Paper Structure (20 sections, 9 figures, 3 tables)

This paper contains 20 sections, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Can you distinguish synthetic data from real data in different modalities? (a) X-ray image examples gao2023synthetic. (b) CT image examples hamamci2023generatect. (c) MR image examples park2021generative. (d) PET image example. (e) Endoscopy image example li2024endora. (f) Histopathology image example aversa2024diffinfinite.
  • Figure 2: Tumors in solid organs. (a) Hepatocellular carcinoma. (b) Thymoma. (c) Solid pseudopapillary tumor of the pancreas. (d) Pancreatic mucinous cystadenoma. (e) Pancreatic mucinous cystadenocarcinoma. (f) Pancreatic adenocarcinoma. (g) Neuroendocrine tumor in liver. (h) Meningioma. (i) Mediastinal lymphoma.
  • Figure 3: Tumors in tubular organs. (a) Gastrointestinal stromal tumor. (b) Sigmoid colon cancer. (c) Gastric cancer. (d) Lung metastases. (e) Intestinal carcinoid tumor. (f) Gallbladder carcinoma. (g) Gallbladder adenocarcinoma. (h) Colon cancer. (i) Cholangiocarcinoma.
  • Figure 4: Feature analysis and reader study. The left panel features a t-SNE (t-distributed stochastic neighbor embedding) visualization that maps the multidimensional Radiomics features of tumors from the liver, pancreas, and kidneys onto a two-dimensional space. This visualization underscores the substantial overlap in features among early-stage tumors from different organs, which may contribute to the challenges in correctly identifying their organ types. Complementing these findings, this study evaluates the efficacy of a support vector machine (SVM) classifier, which utilizes Radiomics Features chu2019utilitywang2017comparison, in differentiating the organ types for the cropped tumors. The SVM classifier is trained to classify each tumor as originating from either the liver, pancreas, or kidneys—a three-way classification challenge. Parallel to the assessment of the SVM classifier, three expert radiologists conducted a similar evaluation by reviewing the original CT scans containing these tumors. The results displayed on the right panel reveal significant difficulties faced by both the SVM classifier and the radiologists when it comes to accurately pinpointing the origin of early-stage tumors. The precision and recall metrics for both the machine and human methods approximate the performance expected from random selection.
  • Figure 5: Synthetic liver, pancreatic, and kidney tumors generated by Cellular Automata lai2024pixel.
  • ...and 4 more figures