Table of Contents
Fetching ...

UVMap-ID: A Controllable and Personalized UV Map Generative Model

Weijie Wang, Jichao Zhang, Chang Liu, Xia Li, Xingqian Xu, Humphrey Shi, Nicu Sebe, Bruno Lepri

TL;DR

UVMap-ID addresses the challenge of generating personalized UV texture maps for 3D avatars by fine-tuning a pre-trained text-to-image diffusion model with a face fusion module and a small, balanced dataset, introducing novel metrics to evaluate UV textures. The method uses a prior preservation loss to maintain general text-to-image capabilities while enabling ID-driven personalization. It demonstrates controlled, identity-preserving texture generation with competitive quantitative metrics and creates the CelebA-HQ-UV dataset for community use. The work advances practical 3D avatar creation by enabling personalized, text-guided UV map synthesis with efficient training.

Abstract

Recently, diffusion models have made significant strides in synthesizing realistic 2D human images based on provided text prompts. Building upon this, researchers have extended 2D text-to-image diffusion models into the 3D domain for generating human textures (UV Maps). However, some important problems about UV Map Generative models are still not solved, i.e., how to generate personalized texture maps for any given face image, and how to define and evaluate the quality of these generated texture maps. To solve the above problems, we introduce a novel method, UVMap-ID, which is a controllable and personalized UV Map generative model. Unlike traditional large-scale training methods in 2D, we propose to fine-tune a pre-trained text-to-image diffusion model which is integrated with a face fusion module for achieving ID-driven customized generation. To support the finetuning strategy, we introduce a small-scale attribute-balanced training dataset, including high-quality textures with labeled text and Face ID. Additionally, we introduce some metrics to evaluate the multiple aspects of the textures. Finally, both quantitative and qualitative analyses demonstrate the effectiveness of our method in controllable and personalized UV Map generation. Code is publicly available via https://github.com/twowwj/UVMap-ID.

UVMap-ID: A Controllable and Personalized UV Map Generative Model

TL;DR

UVMap-ID addresses the challenge of generating personalized UV texture maps for 3D avatars by fine-tuning a pre-trained text-to-image diffusion model with a face fusion module and a small, balanced dataset, introducing novel metrics to evaluate UV textures. The method uses a prior preservation loss to maintain general text-to-image capabilities while enabling ID-driven personalization. It demonstrates controlled, identity-preserving texture generation with competitive quantitative metrics and creates the CelebA-HQ-UV dataset for community use. The work advances practical 3D avatar creation by enabling personalized, text-guided UV map synthesis with efficient training.

Abstract

Recently, diffusion models have made significant strides in synthesizing realistic 2D human images based on provided text prompts. Building upon this, researchers have extended 2D text-to-image diffusion models into the 3D domain for generating human textures (UV Maps). However, some important problems about UV Map Generative models are still not solved, i.e., how to generate personalized texture maps for any given face image, and how to define and evaluate the quality of these generated texture maps. To solve the above problems, we introduce a novel method, UVMap-ID, which is a controllable and personalized UV Map generative model. Unlike traditional large-scale training methods in 2D, we propose to fine-tune a pre-trained text-to-image diffusion model which is integrated with a face fusion module for achieving ID-driven customized generation. To support the finetuning strategy, we introduce a small-scale attribute-balanced training dataset, including high-quality textures with labeled text and Face ID. Additionally, we introduce some metrics to evaluate the multiple aspects of the textures. Finally, both quantitative and qualitative analyses demonstrate the effectiveness of our method in controllable and personalized UV Map generation. Code is publicly available via https://github.com/twowwj/UVMap-ID.
Paper Structure (16 sections, 5 equations, 6 figures, 3 tables)

This paper contains 16 sections, 5 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: The left side of the figure shows the overview of our proposed pipeline. Given a reference image as face ID, we utilize a pre-trained text-to-image diffusion model, where the input is a combination of a noised UV Map and text prompt of a unique identifier and characteristics of the portrait where "A [S] Texturemap of [P]," where [S] is a unique identifier and [P] represents the race and gender. To maintain the quality of images generated by the pre-trained model and effectively process textual features, we adopt a prior preservation loss. The right side of the figure shows the detailed architecture of our model, where facial information is mapped to the same dimensions as text embeddings through a facial recognition model and face projection layers. Subsequently, we merge facial and textual information via decoupled cross-attention, which is then integrated into the pre-trained text-to-image model.
  • Figure 2: Personalized textures generation results using face IDs from CelebA-HQ dataset.
  • Figure 3: It shows UV structures, textures from SMPLitex, extracted semantic segmentation, and semantic groundtruth from left to right.
  • Figure 4: Our personalized generation results. The 1st column shows reference faces, obtained from the website, and not existing in our training set.
  • Figure 5: Comparsion with SMPLitex casas2023smplitex results. SMPLitex is not an image ID-driven method. Thus, we provided these celebrities' names in the test prompts for SMPLitex, but not for ours. Taking "Betty Sun" as an example (upper-left corner), the test prompt of SMPLitex is "a texturemap of Betty Sun wearing...", and our test prompt is "a texturemap of Asian woman wearing...". Note that image IDs are not existing in our training data.
  • ...and 1 more figures