Table of Contents
Fetching ...

LesionDiffusion: Towards Text-controlled General Lesion Synthesis

Wenhui Lei, Henrui Tian, Linrui Dai, Hanyu Chen, Xiaofan Zhang

TL;DR

LesionDiffusion is proposed, a text-controllable lesion synthesis framework for 3D CT imaging that generates both lesions and corresponding masks and significantly improves segmentation performance, with strong generalization to unseen lesion types and organs, outperforming current state-of-the-art models.

Abstract

Fully-supervised lesion recognition methods in medical imaging face challenges due to the reliance on large annotated datasets, which are expensive and difficult to collect. To address this, synthetic lesion generation has become a promising approach. However, existing models struggle with scalability, fine-grained control over lesion attributes, and the generation of complex structures. We propose LesionDiffusion, a text-controllable lesion synthesis framework for 3D CT imaging that generates both lesions and corresponding masks. By utilizing a structured lesion report template, our model provides greater control over lesion attributes and supports a wider variety of lesion types. We introduce a dataset of 1,505 annotated CT scans with paired lesion masks and structured reports, covering 14 lesion types across 8 organs. LesionDiffusion consists of two components: a lesion mask synthesis network (LMNet) and a lesion inpainting network (LINet), both guided by lesion attributes and image features. Extensive experiments demonstrate that LesionDiffusion significantly improves segmentation performance, with strong generalization to unseen lesion types and organs, outperforming current state-of-the-art models. Code is available at https://github.com/HengruiTianSJTU/LesionDiffusion.

LesionDiffusion: Towards Text-controlled General Lesion Synthesis

TL;DR

LesionDiffusion is proposed, a text-controllable lesion synthesis framework for 3D CT imaging that generates both lesions and corresponding masks and significantly improves segmentation performance, with strong generalization to unseen lesion types and organs, outperforming current state-of-the-art models.

Abstract

Fully-supervised lesion recognition methods in medical imaging face challenges due to the reliance on large annotated datasets, which are expensive and difficult to collect. To address this, synthetic lesion generation has become a promising approach. However, existing models struggle with scalability, fine-grained control over lesion attributes, and the generation of complex structures. We propose LesionDiffusion, a text-controllable lesion synthesis framework for 3D CT imaging that generates both lesions and corresponding masks. By utilizing a structured lesion report template, our model provides greater control over lesion attributes and supports a wider variety of lesion types. We introduce a dataset of 1,505 annotated CT scans with paired lesion masks and structured reports, covering 14 lesion types across 8 organs. LesionDiffusion consists of two components: a lesion mask synthesis network (LMNet) and a lesion inpainting network (LINet), both guided by lesion attributes and image features. Extensive experiments demonstrate that LesionDiffusion significantly improves segmentation performance, with strong generalization to unseen lesion types and organs, outperforming current state-of-the-art models. Code is available at https://github.com/HengruiTianSJTU/LesionDiffusion.

Paper Structure

This paper contains 8 sections, 5 equations, 4 figures, 2 tables.

Figures (4)

  • Figure 1: Lesions and Structured Lesion Reports in the Training Stage. The LesionDiffusion model is trained on 14 types of lesions across 8 organs. The corresponding structured reports include 10 attributes, which are categorized into mask attributes and image attributes.
  • Figure 2: Overview of LesionDiffusion framework. (a) LMNet is trained to generate lesion masks; (b) VQ-GAN is trained to compress 3D CT images into latent space and then reconstruct them; (c) LINet is trained to perform lesion inpainting in the latent space; (d) During the inference stage, the framework generalizes to any lesion type, involving lesion attribute generation, lesion bounding box generation, lesion mask generation, and inpainting.
  • Figure 3: Generalization to hematencephalon. (a) Downstream segmentation results for hematencephalon. (b) Examples synthesized by LesionDiffusion.
  • Figure 4: Generation alignment with textual guidance. This figure illustrates the effect of shape, density, and density variation conditions on image inpainting results. (a) Green and blue dots represent the morphological metrics computed for round-like and irregular tumor volumes respectively, with red crosses indicating the metric mean values. (b) Comparison of generation control under the condition of density. (c) Comparison of generation control under the condition of density variation.