Table of Contents
Fetching ...

A Review on Generative AI For Text-To-Image and Image-To-Image Generation and Implications To Scientific Images

Zineb Sordo, Eric Chagnon, Daniela Ushizima

TL;DR

This review surveys state-of-the-art text-to-image and image-to-image generation with a focus on scientific imaging, comparing Variational Autoencoders, Generative Adversarial Networks, and Diffusion Models. It clarifies the core mechanisms, strengths, and limitations of each paradigm and highlights verification and validation challenges, including hallucinations and biases, especially when dealing with novel scientific phenomena. The analysis identifies diffusion models, and in particular latent diffusion and diffusion-transformer variants, as delivering the best balance of image fidelity, controllability, and efficiency for scientific data augmentation. It also discusses practical directions such as integrating language models for better prompt understanding and establishing rigorous V&V workflows to ensure scientifically faithful outputs.

Abstract

This review surveys the state-of-the-art in text-to-image and image-to-image generation within the scope of generative AI. We provide a comparative analysis of three prominent architectures: Variational Autoencoders, Generative Adversarial Networks and Diffusion Models. For each, we elucidate core concepts, architectural innovations, and practical strengths and limitations, particularly for scientific image understanding. Finally, we discuss critical open challenges and potential future research directions in this rapidly evolving field.

A Review on Generative AI For Text-To-Image and Image-To-Image Generation and Implications To Scientific Images

TL;DR

This review surveys state-of-the-art text-to-image and image-to-image generation with a focus on scientific imaging, comparing Variational Autoencoders, Generative Adversarial Networks, and Diffusion Models. It clarifies the core mechanisms, strengths, and limitations of each paradigm and highlights verification and validation challenges, including hallucinations and biases, especially when dealing with novel scientific phenomena. The analysis identifies diffusion models, and in particular latent diffusion and diffusion-transformer variants, as delivering the best balance of image fidelity, controllability, and efficiency for scientific data augmentation. It also discusses practical directions such as integrating language models for better prompt understanding and establishing rigorous V&V workflows to ensure scientifically faithful outputs.

Abstract

This review surveys the state-of-the-art in text-to-image and image-to-image generation within the scope of generative AI. We provide a comparative analysis of three prominent architectures: Variational Autoencoders, Generative Adversarial Networks and Diffusion Models. For each, we elucidate core concepts, architectural innovations, and practical strengths and limitations, particularly for scientific image understanding. Finally, we discuss critical open challenges and potential future research directions in this rapidly evolving field.

Paper Structure

This paper contains 17 sections, 23 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Major highlights of language and multimodal models, with less focus on text-to-image generation models lifearchitect.
  • Figure 2: Variational Inference
  • Figure 3: VAE Encode - Decoder architecture
  • Figure 4: Architecture of the Vanilla GAN
  • Figure 5: Denoising diffusion probabilistic models (DDPMs). Source: Kulkarni2023
  • ...and 4 more figures