Table of Contents
Fetching ...

Generating Realistic X-ray Scattering Images Using Stable Diffusion and Human-in-the-loop Annotations

Zhuowen Zhao, Xiaoya Chong, Tanny Chavez, Alexander Hexemer

TL;DR

The paper addresses generating realistic X-ray scattering images for scientific use by fine-tuning a stable diffusion model on domain data and integrating a continuous human-in-the-loop framework to filter artifacts. It combines latent-diffusion generation with iterative human annotations and ensemble vision classifiers (including ResNet-50 and Vision Transformers) to distinguish realistic outputs, improving over successive training rounds. Latent-space analysis and diffraction-law checks demonstrate that generated images can respect underlying physical constraints, supporting their use for data augmentation and digital twin development in facilities. The work provides a practical, resource-conscious approach and releases code to enable broader adoption in domain-specific image synthesis tasks.

Abstract

We fine-tuned a foundational stable diffusion model using X-ray scattering images and their corresponding descriptions to generate new scientific images from given prompts. However, some of the generated images exhibit significant unrealistic artifacts, commonly known as "hallucinations". To address this issue, we trained various computer vision models on a dataset composed of 60% human-approved generated images and 40% experimental images to detect unrealistic images. The classified images were then reviewed and corrected by human experts, and subsequently used to further refine the classifiers in next rounds of training and inference. Our evaluations demonstrate the feasibility of generating high-fidelity, domain-specific images using a fine-tuned diffusion model. We anticipate that generative AI will play a crucial role in enhancing data augmentation and driving the development of digital twins in scientific research facilities.

Generating Realistic X-ray Scattering Images Using Stable Diffusion and Human-in-the-loop Annotations

TL;DR

The paper addresses generating realistic X-ray scattering images for scientific use by fine-tuning a stable diffusion model on domain data and integrating a continuous human-in-the-loop framework to filter artifacts. It combines latent-diffusion generation with iterative human annotations and ensemble vision classifiers (including ResNet-50 and Vision Transformers) to distinguish realistic outputs, improving over successive training rounds. Latent-space analysis and diffraction-law checks demonstrate that generated images can respect underlying physical constraints, supporting their use for data augmentation and digital twin development in facilities. The work provides a practical, resource-conscious approach and releases code to enable broader adoption in domain-specific image synthesis tasks.

Abstract

We fine-tuned a foundational stable diffusion model using X-ray scattering images and their corresponding descriptions to generate new scientific images from given prompts. However, some of the generated images exhibit significant unrealistic artifacts, commonly known as "hallucinations". To address this issue, we trained various computer vision models on a dataset composed of 60% human-approved generated images and 40% experimental images to detect unrealistic images. The classified images were then reviewed and corrected by human experts, and subsequently used to further refine the classifiers in next rounds of training and inference. Our evaluations demonstrate the feasibility of generating high-fidelity, domain-specific images using a fine-tuned diffusion model. We anticipate that generative AI will play a crucial role in enhancing data augmentation and driving the development of digital twins in scientific research facilities.
Paper Structure (19 sections, 1 equation, 8 figures, 3 tables)

This paper contains 19 sections, 1 equation, 8 figures, 3 tables.

Figures (8)

  • Figure 1: The schematics showing the pipeline to create realistic X-ray scattering images using generative AI and human annotations. The bold black arrows demonstrate the process to generate data from an input prompt.
  • Figure 2: Examples of experimental images used for fine-tuning the Diffusers model.
  • Figure 3: Label Maker and MLCoach web interface
  • Figure 4: Continuous training with human-in-the-loop annotations and ensemble classification
  • Figure 5: The realistic and non-realistic (fake) X-ray scattering patterns generated by the fine-tuned Diffusers model.
  • ...and 3 more figures