Table of Contents
Fetching ...

LightHeadEd: Relightable & Editable Head Avatars from a Smartphone

Pranav Manu, Astitva Srivastava, Amit Raj, Varun Jampani, Avinash Sharma, P. J. Narayanan

TL;DR

LightHeadEd tackles the high barrier to creating relightable head avatars by replacing expensive light stages with a polarized monocular capture pipeline on a smartphone. It introduces a textured 2D Gaussian head representation embedded in FLAME UV space and a self supervised two stage learning framework that disentangles geometry and appearance into UV albedo, normals, and roughness plus an environment lighting map, enabling relighting and editing. The approach is validated on the DuoPolo polarized dataset, demonstrating superior geometry, shading fidelity, and editing capabilities compared to state of the art, with ablations confirming the importance of residual UV maps and UV resolution. This work paves the way for accessible, editable, photorealistic head avatars suitable for metaverse, telepresence, and AR applications, while acknowledging limitations in inner mouth detail and hair realism and outlining ethical considerations for data use.

Abstract

Creating photorealistic, animatable, and relightable 3D head avatars traditionally requires expensive Lightstage with multiple calibrated cameras, making it inaccessible for widespread adoption. To bridge this gap, we present a novel, cost-effective approach for creating high-quality relightable head avatars using only a smartphone equipped with polaroid filters. Our approach involves simultaneously capturing cross-polarized and parallel-polarized video streams in a dark room with a single point-light source, separating the skin's diffuse and specular components during dynamic facial performances. We introduce a hybrid representation that embeds 2D Gaussians in the UV space of a parametric head model, facilitating efficient real-time rendering while preserving high-fidelity geometric details. Our learning-based neural analysis-by-synthesis pipeline decouples pose and expression-dependent geometrical offsets from appearance, decomposing the surface into albedo, normal, and specular UV texture maps, along with the environment maps. We collect a unique dataset of various subjects performing diverse facial expressions and head movements.

LightHeadEd: Relightable & Editable Head Avatars from a Smartphone

TL;DR

LightHeadEd tackles the high barrier to creating relightable head avatars by replacing expensive light stages with a polarized monocular capture pipeline on a smartphone. It introduces a textured 2D Gaussian head representation embedded in FLAME UV space and a self supervised two stage learning framework that disentangles geometry and appearance into UV albedo, normals, and roughness plus an environment lighting map, enabling relighting and editing. The approach is validated on the DuoPolo polarized dataset, demonstrating superior geometry, shading fidelity, and editing capabilities compared to state of the art, with ablations confirming the importance of residual UV maps and UV resolution. This work paves the way for accessible, editable, photorealistic head avatars suitable for metaverse, telepresence, and AR applications, while acknowledging limitations in inner mouth detail and hair realism and outlining ethical considerations for data use.

Abstract

Creating photorealistic, animatable, and relightable 3D head avatars traditionally requires expensive Lightstage with multiple calibrated cameras, making it inaccessible for widespread adoption. To bridge this gap, we present a novel, cost-effective approach for creating high-quality relightable head avatars using only a smartphone equipped with polaroid filters. Our approach involves simultaneously capturing cross-polarized and parallel-polarized video streams in a dark room with a single point-light source, separating the skin's diffuse and specular components during dynamic facial performances. We introduce a hybrid representation that embeds 2D Gaussians in the UV space of a parametric head model, facilitating efficient real-time rendering while preserving high-fidelity geometric details. Our learning-based neural analysis-by-synthesis pipeline decouples pose and expression-dependent geometrical offsets from appearance, decomposing the surface into albedo, normal, and specular UV texture maps, along with the environment maps. We collect a unique dataset of various subjects performing diverse facial expressions and head movements.

Paper Structure

This paper contains 16 sections, 13 equations, 9 figures, 4 tables.

Figures (9)

  • Figure 1: We introduce LightHeadEd to reconstruct realistic head avatars with editing & relighting support.
  • Figure 2: Proposed dynamic capture setup: Smartphone equipped with polaroid filters (left); Cross-Polarized & Parallel-Polarized Monocular Video Streams (right).
  • Figure 3: Decomposition of appearance & geometry in UV space.
  • Figure 4: Proposed textured Gaussian head representation with primary UV attribute maps.
  • Figure 5: Proposed two-stage training strategy to learn textured Gaussian head avatars with decomposed appearance and geometry.
  • ...and 4 more figures