Table of Contents
Fetching ...

Pro-Pose: Unpaired Full-Body Portrait Synthesis via Canonical UV Maps

Sandeep Mishra, Yasamin Jafarian, Andreas Lugmayr, Yingwei Li, Varsha Ramakrishnan, Srivatsan Varadharajan, Alan C. Bovik, Ira Kemelmacher-Shlizerman

TL;DR

Pro-Pose introduces a self-supervised framework for unpaired full-body portrait synthesis in canonical UV space, decoupling pose from texture to enable robust reposing from a single image. A novel Donor-based UV Reposing mechanism prevents pose leakage through occlusion boundaries, allowing learning from large unpaired datasets combined with scarce paired data. The model uses a Flow Matching–based generator in a latent UV space and supports test-time personalization via LoRA-based fine-tuning, producing identity-faithful avatars under novel poses. Across DeepFashion and WPose benchmarks, Pro-Pose achieves state-of-the-art fidelity and strong generalization to in-the-wild imagery, with ablations highlighting the importance of the hybrid data strategy and the personalization capability.

Abstract

Photographs of people taken by professional photographers typically present the person in beautiful lighting, with an interesting pose, and flattering quality. This is unlike common photos people can take of themselves. In this paper, we explore how to create a ``professional'' version of a person's photograph, i.e., in a chosen pose, in a simple environment, with good lighting, and standard black top/bottom clothing. A key challenge is to preserve the person's unique identity, face and body features while transforming the photo. If there would exist a large paired dataset of the same person photographed both ``in the wild'' and by a professional photographer, the problem would potentially be easier to solve. However, such data does not exist, especially for a large variety of identities. To that end, we propose two key insights: 1) Our method transforms the input photo and person's face to a canonical UV space, which is further coupled with reposing methodology to model occlusions and novel view synthesis. Operating in UV space allows us to leverage existing unpaired datasets. 2) We personalize the output photo via multi image finetuning. Our approach yields high-quality, reposed portraits and achieves strong qualitative and quantitative performance on real-world imagery.

Pro-Pose: Unpaired Full-Body Portrait Synthesis via Canonical UV Maps

TL;DR

Pro-Pose introduces a self-supervised framework for unpaired full-body portrait synthesis in canonical UV space, decoupling pose from texture to enable robust reposing from a single image. A novel Donor-based UV Reposing mechanism prevents pose leakage through occlusion boundaries, allowing learning from large unpaired datasets combined with scarce paired data. The model uses a Flow Matching–based generator in a latent UV space and supports test-time personalization via LoRA-based fine-tuning, producing identity-faithful avatars under novel poses. Across DeepFashion and WPose benchmarks, Pro-Pose achieves state-of-the-art fidelity and strong generalization to in-the-wild imagery, with ablations highlighting the importance of the hybrid data strategy and the personalization capability.

Abstract

Photographs of people taken by professional photographers typically present the person in beautiful lighting, with an interesting pose, and flattering quality. This is unlike common photos people can take of themselves. In this paper, we explore how to create a ``professional'' version of a person's photograph, i.e., in a chosen pose, in a simple environment, with good lighting, and standard black top/bottom clothing. A key challenge is to preserve the person's unique identity, face and body features while transforming the photo. If there would exist a large paired dataset of the same person photographed both ``in the wild'' and by a professional photographer, the problem would potentially be easier to solve. However, such data does not exist, especially for a large variety of identities. To that end, we propose two key insights: 1) Our method transforms the input photo and person's face to a canonical UV space, which is further coupled with reposing methodology to model occlusions and novel view synthesis. Operating in UV space allows us to leverage existing unpaired datasets. 2) We personalize the output photo via multi image finetuning. Our approach yields high-quality, reposed portraits and achieves strong qualitative and quantitative performance on real-world imagery.

Paper Structure

This paper contains 32 sections, 11 equations, 16 figures, 4 tables.

Figures (16)

  • Figure 1: Pro-Pose Avatar Generation. From a single in-the-wild photo (top row), Pro-Pose generates a portfolio of high-fidelity avatars (bottom rows), fully driven by arbitrary SMPL-X poses Pavlakos2019SMPLX. The appearance is canonicalized to a black tank top and shorts and photoshoot setting environment. Our method preserves the user's identity, facial features, and body shape across novel views and complex poses.
  • Figure 2: Pose Leakage Mitigation. (a) Standard partial textures $\mathbf{T_p}$ leak source pose information via occlusion boundaries, allowing trivial reconstruction shortcuts. (b) Generating pseudo-pairs via image-space rendering is computationally prohibitive for online training. (c) Our Donor-based Reposing efficiently bypasses rendering by applying random donor masks $\mathbf{M}_{\tilde{\mathbf{p}}}$ directly in UV space. This simulates novel occlusions, preventing leakage and forcing the network to learn robust geometric warping.
  • Figure 3: Overview of our Avatar Generation Framework. Our approach leverages single-view datasets by operating in a canonical UV space, extracting UV texture and pose Pavlakos2019SMPLX. Left (Paired Supervision): When ground-truth pose pairs are available, we condition the Flow Matching model directly on the target pose and face crop. Right (Single-View Self-Supervision): To prevent "pose leakage" from occlusion boundaries when training on single images, we introduce a Donor-based UV Reposing module (Sec. \ref{['sec:UVreposing']}). This synthetically re-poses the input texture using a random donor visibility mask, forcing the model to learn robust identity representations. Furthermore, we drop-out the face crop condition in this branch to prevent trivial reconstruction via pixel-perfect information leakage.
  • Figure 4: Finetuning pipeline. We build input–target pairs from a few-shot subject set and apply a facially masked paired Flow Matching loss to personalize the model at test time.
  • Figure 5: Base Clothing (BC) Standardization. We apply different preprocessing strategies based on the data. For DeepFashion deepfashion and Commerce images Zhu2024MMVTOCheng2023TryOnDiffusion, we generate the base garment while preserving pose and identity via pixel-aligned editing (Prompt V1). For FFHQ ffhq, we use generative outpainting to expand limited face crops into full-body samples (Prompt V2).
  • ...and 11 more figures