Table of Contents
Fetching ...

Two-stage Cytopathological Image Synthesis for Augmenting Cervical Abnormality Screening

Zhenrong Shen, Manman Fei, Xin Wang, Jiangdong Cai, Sheng Wang, Lichi Zhang, Qian Wang

TL;DR

This work tackles data scarcity in automated cervical abnormality screening by introducing a two-stage data synthesis framework based on Stable Diffusion. A Normal Image Generator with LoRA produces high-resolution NILM background images, while an Abnormal Cell Synthesizer editing stage converts selected NILM cells into abnormal types with bounding-box conditioning and gated self-attention, yielding annotated synthetic data. Quantitative and qualitative results show improved synthetic realism (FID) and significant boosts in downstream abnormal-cell detection performance, with pure synthetic data offering competitive or superior augmentation compared to real-to-synthetic data. The approach demonstrates strong potential for scalable data augmentation across domains, though limitations such as gigapixel WSI generation, speed, and dataset diversity are acknowledged for future work.

Abstract

Automatic thin-prep cytologic test (TCT) screening can assist pathologists in finding cervical abnormality towards accurate and efficient cervical cancer diagnosis. Current automatic TCT screening systems mostly involve abnormal cervical cell detection, which generally requires large-scale and diverse training data with high-quality annotations to achieve promising performance. Pathological image synthesis is naturally raised to minimize the efforts in data collection and annotation. However, it is challenging to generate realistic large-size cytopathological images while simultaneously synthesizing visually plausible appearances for small-size abnormal cervical cells. In this paper, we propose a two-stage image synthesis framework to create synthetic data for augmenting cervical abnormality screening. In the first Global Image Generation stage, a Normal Image Generator is designed to generate cytopathological images full of normal cervical cells. In the second Local Cell Editing stage, normal cells are randomly selected from the generated images and then are converted to different types of abnormal cells using the proposed Abnormal Cell Synthesizer. Both Normal Image Generator and Abnormal Cell Synthesizer are built upon Stable Diffusion, a pre-trained foundation model for image synthesis, via parameter-efficient fine-tuning methods for customizing cytopathological image contents and extending spatial layout controllability, respectively. Our experiments demonstrate the synthetic image quality, diversity, and controllability of the proposed synthesis framework, and validate its data augmentation effectiveness in enhancing the performance of abnormal cervical cell detection.

Two-stage Cytopathological Image Synthesis for Augmenting Cervical Abnormality Screening

TL;DR

This work tackles data scarcity in automated cervical abnormality screening by introducing a two-stage data synthesis framework based on Stable Diffusion. A Normal Image Generator with LoRA produces high-resolution NILM background images, while an Abnormal Cell Synthesizer editing stage converts selected NILM cells into abnormal types with bounding-box conditioning and gated self-attention, yielding annotated synthetic data. Quantitative and qualitative results show improved synthetic realism (FID) and significant boosts in downstream abnormal-cell detection performance, with pure synthetic data offering competitive or superior augmentation compared to real-to-synthetic data. The approach demonstrates strong potential for scalable data augmentation across domains, though limitations such as gigapixel WSI generation, speed, and dataset diversity are acknowledged for future work.

Abstract

Automatic thin-prep cytologic test (TCT) screening can assist pathologists in finding cervical abnormality towards accurate and efficient cervical cancer diagnosis. Current automatic TCT screening systems mostly involve abnormal cervical cell detection, which generally requires large-scale and diverse training data with high-quality annotations to achieve promising performance. Pathological image synthesis is naturally raised to minimize the efforts in data collection and annotation. However, it is challenging to generate realistic large-size cytopathological images while simultaneously synthesizing visually plausible appearances for small-size abnormal cervical cells. In this paper, we propose a two-stage image synthesis framework to create synthetic data for augmenting cervical abnormality screening. In the first Global Image Generation stage, a Normal Image Generator is designed to generate cytopathological images full of normal cervical cells. In the second Local Cell Editing stage, normal cells are randomly selected from the generated images and then are converted to different types of abnormal cells using the proposed Abnormal Cell Synthesizer. Both Normal Image Generator and Abnormal Cell Synthesizer are built upon Stable Diffusion, a pre-trained foundation model for image synthesis, via parameter-efficient fine-tuning methods for customizing cytopathological image contents and extending spatial layout controllability, respectively. Our experiments demonstrate the synthetic image quality, diversity, and controllability of the proposed synthesis framework, and validate its data augmentation effectiveness in enhancing the performance of abnormal cervical cell detection.
Paper Structure (18 sections, 5 equations, 8 figures, 3 tables)

This paper contains 18 sections, 5 equations, 8 figures, 3 tables.

Figures (8)

  • Figure 1: A real abnormal cervical cytopathological image and four real abnormal cell patches including ASC-US, ASC-H, LSIL, and HSIL are displayed on the left. Two examples of synthetic abnormal cervical cytopathological images produced by our proposed framework are shown on the right.
  • Figure 2: Our proposed cervical cytopathological image synthesis framework consists of two stages: (a) Global Image Generation. Stable Diffusion is transferred to a Normal Image Generator using Low-Rank Adaptation (LoRA) for creating high-resolution normal cytopathological images with NILM cells only. (b) Local Cell Editing. A certain number of synthetic NILM cells are randomly selected and marked with bounding boxes using a simple in-house cell detector. Conditioned on the input texts and the bounding boxes, the proposed Abnormal Cell Synthesizer can translate these selected cells to abnormal cells of user-defined types, thus obtaining diverse annotated abnormal cervical cytopathological images for data augmentation.
  • Figure 3: The pre-trained Stable Diffusion autoencoder can reconstruct both normal and abnormal cytopathological images well with negligible errors, suggesting that only the denoising U-Net needs to be fine-tuned.
  • Figure 4: Low-Rank Adaptation (LoRA) is implemented in the pre-trained query, key, value, and output projection matrices of the pre-trained cross-attention modules. The LoRA weights learn to align image features with text embeddings for cytopathological image synthesis.
  • Figure 5: Detailed scheme of Local Cell Editing stage includes (a) the editing process of Abnormal Cell Synthesizer, (b) the construction process of conditioning feature, and (c) the module structure of gated self-attention layer.
  • ...and 3 more figures