A Review on Generative AI For Text-To-Image and Image-To-Image Generation and Implications To Scientific Images

Zineb Sordo; Eric Chagnon; Daniela Ushizima

A Review on Generative AI For Text-To-Image and Image-To-Image Generation and Implications To Scientific Images

Zineb Sordo, Eric Chagnon, Daniela Ushizima

TL;DR

This review surveys state-of-the-art text-to-image and image-to-image generation with a focus on scientific imaging, comparing Variational Autoencoders, Generative Adversarial Networks, and Diffusion Models. It clarifies the core mechanisms, strengths, and limitations of each paradigm and highlights verification and validation challenges, including hallucinations and biases, especially when dealing with novel scientific phenomena. The analysis identifies diffusion models, and in particular latent diffusion and diffusion-transformer variants, as delivering the best balance of image fidelity, controllability, and efficiency for scientific data augmentation. It also discusses practical directions such as integrating language models for better prompt understanding and establishing rigorous V&V workflows to ensure scientifically faithful outputs.

Abstract

This review surveys the state-of-the-art in text-to-image and image-to-image generation within the scope of generative AI. We provide a comparative analysis of three prominent architectures: Variational Autoencoders, Generative Adversarial Networks and Diffusion Models. For each, we elucidate core concepts, architectural innovations, and practical strengths and limitations, particularly for scientific image understanding. Finally, we discuss critical open challenges and potential future research directions in this rapidly evolving field.

A Review on Generative AI For Text-To-Image and Image-To-Image Generation and Implications To Scientific Images

TL;DR

Abstract

A Review on Generative AI For Text-To-Image and Image-To-Image Generation and Implications To Scientific Images

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (9)