RTGaze: Real-Time 3D-Aware Gaze Redirection from a Single Image
Hengfei Wang, Zhongqun Zhang, Yihua Cheng, Hyung Jin Chang
TL;DR
RTGaze addresses the challenge of real-time, 3D-aware gaze redirection from a single image by introducing a gaze-controllable facial representation learned from images and gaze prompts, and by distilling 3D priors from a pretrained 3D portrait generator into a lightweight triplane-based renderer. The method applies a dual-encoder backbone (high- and low-frequency features) with cross-attention-based gaze prompt injection, combined with depth-prior distillation optimized via $ \mathcal{L} = \alpha \mathcal{L}_{\mathcal{R}} + \beta \mathcal{L}_{\mathcal{D}} + \gamma \mathcal{L}_{\mathcal{P}}$ and a dedicated eye-region reconstruction loss. Empirically, RTGaze achieves state-of-the-art efficiency and competitive or superior image quality and gaze accuracy on ETH-XGaze, ColumbiaGaze, and MPIIFaceGaze, delivering processing times around $61\mathrm{ms}$ per image without requiring test-time GAN inversion. This combination of fast inference, 3D consistency, and high-fidelity gaze control has strong practical implications for real-time digital humans, AR/VR, and broadcast applications, while maintaining identity and photorealism. The key innovation lies in the separation of appearance and geometry through dual encoders, gaze prompt cross-attention, and the distillation of 3D priors into a lightweight rendering module, enabling efficient 3D-aware gaze redirection from a single image.
Abstract
Gaze redirection methods aim to generate realistic human face images with controllable eye movement. However, recent methods often struggle with 3D consistency, efficiency, or quality, limiting their practical applications. In this work, we propose RTGaze, a real-time and high-quality gaze redirection method. Our approach learns a gaze-controllable facial representation from face images and gaze prompts, then decodes this representation via neural rendering for gaze redirection. Additionally, we distill face geometric priors from a pretrained 3D portrait generator to enhance generation quality. We evaluate RTGaze both qualitatively and quantitatively, demonstrating state-of-the-art performance in efficiency, redirection accuracy, and image quality across multiple datasets. Our system achieves real-time, 3D-aware gaze redirection with a feedforward network (~0.06 sec/image), making it 800x faster than the previous state-of-the-art 3D-aware methods.
