Fast Registration of Photorealistic Avatars for VR Facial Animation
Chaitanya Patel, Shaojie Bai, Te-Li Wang, Jason Saragih, Shih-En Wei
TL;DR
This work tackles fast, high-fidelity registration of photorealistic VR avatars using headset-mounted infrared images, addressing a core domain gap between IR camera data and avatar renderings. It decouples the problem into a transformer-based iterative refinement module and an avatar-conditioned image-to-image style transfer module, enabling online, identity-generalizable registration without costly offline optimization. The approach shows superior online performance over direct regression and approaches offline results while offering real-time applicability, validated on a large, multi-identity dataset and released publicly. The key contribution is a generic, two-module framework that mutually reinforces domain adaptation and pose-expression estimation, with detailed ablations and architectural disclosures to spur further research. This has practical impact for immersive VR telepresence and adaptive real-time avatar animation.
Abstract
Virtual Reality (VR) bares promise of social interactions that can feel more immersive than other media. Key to this is the ability to accurately animate a personalized photorealistic avatar, and hence the acquisition of the labels for headset-mounted camera (HMC) images need to be efficient and accurate, while wearing a VR headset. This is challenging due to oblique camera views and differences in image modality. In this work, we first show that the domain gap between the avatar and HMC images is one of the primary sources of difficulty, where a transformer-based architecture achieves high accuracy on domain-consistent data, but degrades when the domain-gap is re-introduced. Building on this finding, we propose a system split into two parts: an iterative refinement module that takes in-domain inputs, and a generic avatar-guided image-to-image domain transfer module conditioned on current estimates. These two modules reinforce each other: domain transfer becomes easier when close-to-groundtruth examples are shown, and better domain-gap removal in turn improves the registration. Our system obviates the need for costly offline optimization, and produces online registration of higher quality than direct regression method. We validate the accuracy and efficiency of our approach through extensive experiments on a commodity headset, demonstrating significant improvements over these baselines. To stimulate further research in this direction, we make our large-scale dataset and code publicly available.
