Table of Contents
Fetching ...

Personalized Image Generation with Deep Generative Models: A Decade Survey

Yuxiang Wei, Yiheng Zheng, Yabo Zhang, Ming Liu, Zhilong Ji, Lei Zhang, Wangmeng Zuo

TL;DR

This decade-spanning survey introduces a unified framework for personalized image generation across GANs, diffusion models, and multi-modal autoregressive models, breaking personalization into concept inversion and generation via shared components: inversion spaces, inversion methods, and personalization schemes. It inventories, contrasts, and systematizes techniques across model families, detailing GAN inversion (optimization, learning, hybrid) and diffusion-based inversion (training-free, optimization, learning, hybrid) with extensive coverage of subject, face, style, and high-level semantic personalization, plus text-driven editing and multi-concept scenarios. The article also surveys evaluation datasets and metrics (FID, ID, CLIP/DINO similarities, text-editability scores, and human studies) and discusses open challenges such as balancing subject fidelity with text controllability, universal category personalization, multi-condition generation, and extensions to video/3D. By synthesizing methods, proposing a common vocabulary, and outlining future directions, the paper provides a practical roadmap for researchers and developers pursuing personalized content generation in real-world applications.

Abstract

Recent advancements in generative models have significantly facilitated the development of personalized content creation. Given a small set of images with user-specific concept, personalized image generation allows to create images that incorporate the specified concept and adhere to provided text descriptions. Due to its wide applications in content creation, significant effort has been devoted to this field in recent years. Nonetheless, the technologies used for personalization have evolved alongside the development of generative models, with their distinct and interrelated components. In this survey, we present a comprehensive review of generalized personalized image generation across various generative models, including traditional GANs, contemporary text-to-image diffusion models, and emerging multi-model autoregressive models. We first define a unified framework that standardizes the personalization process across different generative models, encompassing three key components, i.e., inversion spaces, inversion methods, and personalization schemes. This unified framework offers a structured approach to dissecting and comparing personalization techniques across different generative architectures. Building upon this unified framework, we further provide an in-depth analysis of personalization techniques within each generative model, highlighting their unique contributions and innovations. Through comparative analysis, this survey elucidates the current landscape of personalized image generation, identifying commonalities and distinguishing features among existing methods. Finally, we discuss the open challenges in the field and propose potential directions for future research. We keep tracing related works at https://github.com/csyxwei/Awesome-Personalized-Image-Generation.

Personalized Image Generation with Deep Generative Models: A Decade Survey

TL;DR

This decade-spanning survey introduces a unified framework for personalized image generation across GANs, diffusion models, and multi-modal autoregressive models, breaking personalization into concept inversion and generation via shared components: inversion spaces, inversion methods, and personalization schemes. It inventories, contrasts, and systematizes techniques across model families, detailing GAN inversion (optimization, learning, hybrid) and diffusion-based inversion (training-free, optimization, learning, hybrid) with extensive coverage of subject, face, style, and high-level semantic personalization, plus text-driven editing and multi-concept scenarios. The article also surveys evaluation datasets and metrics (FID, ID, CLIP/DINO similarities, text-editability scores, and human studies) and discusses open challenges such as balancing subject fidelity with text controllability, universal category personalization, multi-condition generation, and extensions to video/3D. By synthesizing methods, proposing a common vocabulary, and outlining future directions, the paper provides a practical roadmap for researchers and developers pursuing personalized content generation in real-world applications.

Abstract

Recent advancements in generative models have significantly facilitated the development of personalized content creation. Given a small set of images with user-specific concept, personalized image generation allows to create images that incorporate the specified concept and adhere to provided text descriptions. Due to its wide applications in content creation, significant effort has been devoted to this field in recent years. Nonetheless, the technologies used for personalization have evolved alongside the development of generative models, with their distinct and interrelated components. In this survey, we present a comprehensive review of generalized personalized image generation across various generative models, including traditional GANs, contemporary text-to-image diffusion models, and emerging multi-model autoregressive models. We first define a unified framework that standardizes the personalization process across different generative models, encompassing three key components, i.e., inversion spaces, inversion methods, and personalization schemes. This unified framework offers a structured approach to dissecting and comparing personalization techniques across different generative architectures. Building upon this unified framework, we further provide an in-depth analysis of personalization techniques within each generative model, highlighting their unique contributions and innovations. Through comparative analysis, this survey elucidates the current landscape of personalized image generation, identifying commonalities and distinguishing features among existing methods. Finally, we discuss the open challenges in the field and propose potential directions for future research. We keep tracing related works at https://github.com/csyxwei/Awesome-Personalized-Image-Generation.

Paper Structure

This paper contains 47 sections, 9 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: The rough number of papers on personalized image generation with deep generative models. Representative works on the personalization task over time are shown. The GAN-based methods, diffusion-based methods, and autoregressive-based methods are highlighted in blue, orange, green, respectively
  • Figure 2: Taxonomy of Personalized Image Generation.
  • Figure 3: Illustration of different Generative Models.
  • Figure 4: Generalized personalized image generation with generative models.(a)-(b): Generalized personalized image generation with Generative Adversarial Networks (GANs). GAN Inversion first maps real images into a GAN's latent space, which can be used to reconstruct the input images. Then, latent editing is performed to generate personalized concepts with various attributes. The editing directions can be either predefined or derived from text. (c) Text-driven image editing with text-to-image diffusion models. After inverting the given image into noise space, text-driven editing techniques are applied to create target concepts with desired attributes specified by text. (d) Personalized image generation with text-to-image diffusion models. The target concept is inverted into the representation space of diffusion models, which is then directly combined with text prompts to generate the desired personalized images. (e) Personalized image generation with multi-modal autoregressive models. Images and text are encoded into a shared latent space, enabling the integration of these information to generate target personalized images.
  • Figure 5: Inversion spaces of different generative models.(a) For GAN-based personalization methods, the concept can be inverted into generalized style space, feature space, or parameter space. (b) For diffusion-based personalization methods, the concept can be inverted into noise space, textual space, feature space or parameter space. (c) For AR-based personalization methods, the concept is typically encoded into a shared space with text, referred to here as token space.
  • ...and 4 more figures