EdgeRelight360: Text-Conditioned 360-Degree HDR Image Generation for Real-Time On-Device Video Portrait Relighting
Min-Hui Lin, Mahesh Reddy, Guillaume Berger, Michel Sarkis, Fatih Porikli, Ning Bi
TL;DR
EdgeRelight360 tackles the challenge of privacy-preserving, real-time on-device video portrait relighting by synthesizing 360-degree HDRI environment maps conditioned on text prompts. It introduces a diffusion-based text-to-360-degree HDRI generator trained on quantized HDR data and a lightweight, edge-friendly relighting pipeline that uses a Geometry Net for normals and a light-adding renderer to apply diffuse lighting from the HDRI, all running on mobile hardware with INT8 quantization. The approach achieves real-time on-device inference (about $25$ fps end-to-end) and produces temporally stable, photorealistic relighting in dynamic video scenarios, outperforming remote HITL and prior on-device methods in both speed and visual quality. The work demonstrates practical impact for mobile video conferencing, gaming, and AR by enabling controllable lighting directly from textual descriptions while preserving user privacy. Key contributions include the end-to-end on-device HDRI generation from text, a lightweight relighting pipeline, quantization-aware training, and extensive on-device evaluation against existing HDRI-based and relighting methods.
Abstract
In this paper, we present EdgeRelight360, an approach for real-time video portrait relighting on mobile devices, utilizing text-conditioned generation of 360-degree high dynamic range image (HDRI) maps. Our method proposes a diffusion-based text-to-360-degree image generation in the HDR domain, taking advantage of the HDR10 standard. This technique facilitates the generation of high-quality, realistic lighting conditions from textual descriptions, offering flexibility and control in portrait video relighting task. Unlike the previous relighting frameworks, our proposed system performs video relighting directly on-device, enabling real-time inference with real 360-degree HDRI maps. This on-device processing ensures both privacy and guarantees low runtime, providing an immediate response to changes in lighting conditions or user inputs. Our approach paves the way for new possibilities in real-time video applications, including video conferencing, gaming, and augmented reality, by allowing dynamic, text-based control of lighting conditions.
