A Review on Generative AI For Text-To-Image and Image-To-Image Generation and Implications To Scientific Images
Zineb Sordo, Eric Chagnon, Daniela Ushizima
TL;DR
This review surveys state-of-the-art text-to-image and image-to-image generation with a focus on scientific imaging, comparing Variational Autoencoders, Generative Adversarial Networks, and Diffusion Models. It clarifies the core mechanisms, strengths, and limitations of each paradigm and highlights verification and validation challenges, including hallucinations and biases, especially when dealing with novel scientific phenomena. The analysis identifies diffusion models, and in particular latent diffusion and diffusion-transformer variants, as delivering the best balance of image fidelity, controllability, and efficiency for scientific data augmentation. It also discusses practical directions such as integrating language models for better prompt understanding and establishing rigorous V&V workflows to ensure scientifically faithful outputs.
Abstract
This review surveys the state-of-the-art in text-to-image and image-to-image generation within the scope of generative AI. We provide a comparative analysis of three prominent architectures: Variational Autoencoders, Generative Adversarial Networks and Diffusion Models. For each, we elucidate core concepts, architectural innovations, and practical strengths and limitations, particularly for scientific image understanding. Finally, we discuss critical open challenges and potential future research directions in this rapidly evolving field.
