LightHeadEd: Relightable & Editable Head Avatars from a Smartphone
Pranav Manu, Astitva Srivastava, Amit Raj, Varun Jampani, Avinash Sharma, P. J. Narayanan
TL;DR
LightHeadEd tackles the high barrier to creating relightable head avatars by replacing expensive light stages with a polarized monocular capture pipeline on a smartphone. It introduces a textured 2D Gaussian head representation embedded in FLAME UV space and a self supervised two stage learning framework that disentangles geometry and appearance into UV albedo, normals, and roughness plus an environment lighting map, enabling relighting and editing. The approach is validated on the DuoPolo polarized dataset, demonstrating superior geometry, shading fidelity, and editing capabilities compared to state of the art, with ablations confirming the importance of residual UV maps and UV resolution. This work paves the way for accessible, editable, photorealistic head avatars suitable for metaverse, telepresence, and AR applications, while acknowledging limitations in inner mouth detail and hair realism and outlining ethical considerations for data use.
Abstract
Creating photorealistic, animatable, and relightable 3D head avatars traditionally requires expensive Lightstage with multiple calibrated cameras, making it inaccessible for widespread adoption. To bridge this gap, we present a novel, cost-effective approach for creating high-quality relightable head avatars using only a smartphone equipped with polaroid filters. Our approach involves simultaneously capturing cross-polarized and parallel-polarized video streams in a dark room with a single point-light source, separating the skin's diffuse and specular components during dynamic facial performances. We introduce a hybrid representation that embeds 2D Gaussians in the UV space of a parametric head model, facilitating efficient real-time rendering while preserving high-fidelity geometric details. Our learning-based neural analysis-by-synthesis pipeline decouples pose and expression-dependent geometrical offsets from appearance, decomposing the surface into albedo, normal, and specular UV texture maps, along with the environment maps. We collect a unique dataset of various subjects performing diverse facial expressions and head movements.
