Table of Contents
Fetching ...

SUPER: Selfie Undistortion and Head Pose Editing with Identity Preservation

Polina Karpikova, Andrei Spiridonov, Anna Vorontsova, Anastasia Yaschenko, Ekaterina Radionova, Igor Medvedev, Alexander Limonov

TL;DR

This work addresses perspective distortion and head‑pose misalignment in close‑up selfies by introducing SUPER, a hybrid pipeline that jointly optimizes a 3D GAN latent code $w$ and camera parameters $c$ via 3D GAN inversion. It uses depth‑based 3D warping to render a novel view and a visibility‑based blending strategy to seamlessly combine warped texture with GAN‑generated content, thereby preserving identity. A depth‑induced mesh and an encoder‑driven initialization (via TriPlaneNet and Deep3DFaceRecon) enable fast, stable optimization with a final EG3D render producing both an image and a depth map. Experiments on CMDP and the authors’ HeRo dataset demonstrate state‑of‑the‑art performance in both face undistortion and head pose editing, enabling photorealistic selfie editing with improved detail and identity preservation.

Abstract

Self-portraits captured from a short distance might look unnatural or even unattractive due to heavy distortions making facial features malformed, and ill-placed head poses. In this paper, we propose SUPER, a novel method of eliminating distortions and adjusting head pose in a close-up face crop. We perform 3D GAN inversion for a facial image by optimizing camera parameters and face latent code, which gives a generated image. Besides, we estimate depth from the obtained latent code, create a depth-induced 3D mesh, and render it with updated camera parameters to obtain a warped portrait. Finally, we apply the visibility-based blending so that visible regions are reprojected, and occluded parts are restored with a generative model. Experiments on face undistortion benchmarks and on our self-collected Head Rotation dataset (HeRo), show that SUPER outperforms previous approaches both qualitatively and quantitatively, opening new possibilities for photorealistic selfie editing.

SUPER: Selfie Undistortion and Head Pose Editing with Identity Preservation

TL;DR

This work addresses perspective distortion and head‑pose misalignment in close‑up selfies by introducing SUPER, a hybrid pipeline that jointly optimizes a 3D GAN latent code and camera parameters via 3D GAN inversion. It uses depth‑based 3D warping to render a novel view and a visibility‑based blending strategy to seamlessly combine warped texture with GAN‑generated content, thereby preserving identity. A depth‑induced mesh and an encoder‑driven initialization (via TriPlaneNet and Deep3DFaceRecon) enable fast, stable optimization with a final EG3D render producing both an image and a depth map. Experiments on CMDP and the authors’ HeRo dataset demonstrate state‑of‑the‑art performance in both face undistortion and head pose editing, enabling photorealistic selfie editing with improved detail and identity preservation.

Abstract

Self-portraits captured from a short distance might look unnatural or even unattractive due to heavy distortions making facial features malformed, and ill-placed head poses. In this paper, we propose SUPER, a novel method of eliminating distortions and adjusting head pose in a close-up face crop. We perform 3D GAN inversion for a facial image by optimizing camera parameters and face latent code, which gives a generated image. Besides, we estimate depth from the obtained latent code, create a depth-induced 3D mesh, and render it with updated camera parameters to obtain a warped portrait. Finally, we apply the visibility-based blending so that visible regions are reprojected, and occluded parts are restored with a generative model. Experiments on face undistortion benchmarks and on our self-collected Head Rotation dataset (HeRo), show that SUPER outperforms previous approaches both qualitatively and quantitatively, opening new possibilities for photorealistic selfie editing.
Paper Structure (20 sections, 3 equations, 7 figures, 2 tables)

This paper contains 20 sections, 3 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: A SUPER selfie editing example. Given a selfie, we can modify a head pose and remove perspective distortion seamlessly, obtaining photorealistic and detailed corrected portraits.
  • Figure 2: An overview of our pipeline. We leverage a TriPlaneNet encoder to obtain an initial face latent code $w_0$, and Deep3DFaceRecon to estimate initial camera parameters $c_0$. Then, we perform an 3D GAN inversion for an input facial image by optimizing the camera parameters $c$ and a face latent code $w$. The optimized face latent code $\hat{w}$ and novel camera parameters $c_{novel}$ are passed into the EG3D Chan2021EfficientG3 model, that generates an image and estimates a depth map. Afterward, we create a depth-induced 3D mesh, and render this mesh to obtain a warped portrait. The final novel view is synthesized by visibility-based blending, so that visible regions are reprojected, and occluded parts are restored with a generative model.
  • Figure 3: First to the left: our HeRo capturing setup with smartphones assembled on a rig. Other: series of photos of the same individuals, simultaneously captured by Front, Left, Right, and Top cameras.
  • Figure 4: Qualitative comparisons on the CMDP dataset. SUPER excels in processing of severely distorted faces. The model not only successfully restores occluded regions, but also preserves crucial identity details, as highlighted in crops.
  • Figure 5: PSNR, SSIM, and LPIPS scores for different number of iterations. Optimal quality is achieved during the first 100 iterations, while further optimization brings a negligible growth of the ID score, yet does not improve other scores.
  • ...and 2 more figures