Personalized Image Generation with Deep Generative Models: A Decade Survey
Yuxiang Wei, Yiheng Zheng, Yabo Zhang, Ming Liu, Zhilong Ji, Lei Zhang, Wangmeng Zuo
TL;DR
This decade-spanning survey introduces a unified framework for personalized image generation across GANs, diffusion models, and multi-modal autoregressive models, breaking personalization into concept inversion and generation via shared components: inversion spaces, inversion methods, and personalization schemes. It inventories, contrasts, and systematizes techniques across model families, detailing GAN inversion (optimization, learning, hybrid) and diffusion-based inversion (training-free, optimization, learning, hybrid) with extensive coverage of subject, face, style, and high-level semantic personalization, plus text-driven editing and multi-concept scenarios. The article also surveys evaluation datasets and metrics (FID, ID, CLIP/DINO similarities, text-editability scores, and human studies) and discusses open challenges such as balancing subject fidelity with text controllability, universal category personalization, multi-condition generation, and extensions to video/3D. By synthesizing methods, proposing a common vocabulary, and outlining future directions, the paper provides a practical roadmap for researchers and developers pursuing personalized content generation in real-world applications.
Abstract
Recent advancements in generative models have significantly facilitated the development of personalized content creation. Given a small set of images with user-specific concept, personalized image generation allows to create images that incorporate the specified concept and adhere to provided text descriptions. Due to its wide applications in content creation, significant effort has been devoted to this field in recent years. Nonetheless, the technologies used for personalization have evolved alongside the development of generative models, with their distinct and interrelated components. In this survey, we present a comprehensive review of generalized personalized image generation across various generative models, including traditional GANs, contemporary text-to-image diffusion models, and emerging multi-model autoregressive models. We first define a unified framework that standardizes the personalization process across different generative models, encompassing three key components, i.e., inversion spaces, inversion methods, and personalization schemes. This unified framework offers a structured approach to dissecting and comparing personalization techniques across different generative architectures. Building upon this unified framework, we further provide an in-depth analysis of personalization techniques within each generative model, highlighting their unique contributions and innovations. Through comparative analysis, this survey elucidates the current landscape of personalized image generation, identifying commonalities and distinguishing features among existing methods. Finally, we discuss the open challenges in the field and propose potential directions for future research. We keep tracing related works at https://github.com/csyxwei/Awesome-Personalized-Image-Generation.
