Total Selfie: Generating Full-Body Selfies

Bowei Chen; Brian Curless; Ira Kemelmacher-Shlizerman; Steven M. Seitz

Total Selfie: Generating Full-Body Selfies

Bowei Chen, Brian Curless, Ira Kemelmacher-Shlizerman, Steven M. Seitz

TL;DR

The paper addresses the challenge of producing full-body selfies from arm-length photographs, which suffer from limited field of view and perspective distortion. It introduces Total Selfie, a diffusion-based framework that generates a full-body image in a target pose from four input selfies (face, upper body, lower body, shoes) and a background, guided by an automatically selected reference pose. The approach comprises a selfie-conditioned inpainting model trained on a synthetic four-selfie-to-full-body dataset, followed by per-capture fine-tuning, face undistortion, target-pose selection, and pose-guided generation with a ControlNet, plus appearance refinement to preserve identity and clothing. Experimental results on twelve individuals across diverse scenes show convincing, high-fidelity outputs with improved realism versus baselines and ablations, demonstrating the method’s practical potential for realistic background composition and pose transfer.

Abstract

We present a method to generate full-body selfies from photographs originally taken at arms length. Because self-captured photos are typically taken close up, they have limited field of view and exaggerated perspective that distorts facial shapes. We instead seek to generate the photo some one else would take of you from a few feet away. Our approach takes as input four selfies of your face and body, a background image, and generates a full-body selfie in a desired target pose. We introduce a novel diffusion-based approach to combine all of this information into high-quality, well-composed photos of you with the desired pose and background.

Total Selfie: Generating Full-Body Selfies

TL;DR

Abstract

Paper Structure (6 sections, 2 equations, 7 figures, 1 table)

This paper contains 6 sections, 2 equations, 7 figures, 1 table.

Introduction
Related Work
Total Selfie
Training Selfie-Conditioned Inpainting Model
Per-Capture Preprocessing and Fine-Tuning
Experiments

Figures (7)

Figure 1: We generate full-body selfies of you (right), from self-captured images of your face and body (top left) and background. You can choose any target pose from a reference photo --- we auto-select a set of good candidates from your photo collection (bottom).
Figure 2: Overview of Total Selfie. First, we train a selfie-conditioned inpainting model based on a synthetic selfie to full-body dataset (blue box). Second, we fine-tune the trained model on a specific capture (orange box), and use it to produce a full-body selfie with the help of modified ControlNet (for pose) and appearance refinement (for face and shoes), visualized in the purple box. Note that, images in the green dashed box (inside the orange box) serve as input and conditional signals to the inpainting model, arrows omitted for simplicity.
Figure 3: Results for different modules of our pipeline. Background image omitted due to space; regions inside bounding box (c) are to be inpainted. The Canny Edge image in (b) is detected from the reference image, inset. Generating without fine-tuning (c) produces inaccurate outfit and identity. Through fine-tuning, the pipeline (d) generates correct outfit with reasonable shading and clothing details (e.g., wrinkles on upper cloth), but with wrong identity. With appearance refinement, the full pipeline (e) yields high-quality full-body selfies.
Figure 4: Results. The second column shows the Canny Edge images detected from reference images (shown as insets). Regions inside yellow box of (c) are the masked regions. Total Selfie generates realistic, full-body images of different individuals with diverse poses and expressions against a variety of backgrounds, while preserving facial expression and clothing. The results are robust to selfies captured in different ways, such as those with one or two hands involved or from a downward-looking perspective (row 5), and with target pose images in outfits that differ somewhat from the input selfies.
Figure 5: Qualitative comparison with two best-performing baselines. For this comparison, we used Canny Edge of the real photo as target pose (inset of (f)). Our pipeline clearly outperforms baselines in terms of photorealism and faithfulness (zoom in for details, including faces and shoes). Note that, while the selfies, background image, and real photo were captured in the same session, variations in lighting conditions, auto exposure, white balance, and other factors may result in intensity and color tone differences.
...and 2 more figures

Total Selfie: Generating Full-Body Selfies

TL;DR

Abstract

Total Selfie: Generating Full-Body Selfies

Authors

TL;DR

Abstract

Table of Contents

Figures (7)