EdgeRelight360: Text-Conditioned 360-Degree HDR Image Generation for Real-Time On-Device Video Portrait Relighting

Min-Hui Lin; Mahesh Reddy; Guillaume Berger; Michel Sarkis; Fatih Porikli; Ning Bi

EdgeRelight360: Text-Conditioned 360-Degree HDR Image Generation for Real-Time On-Device Video Portrait Relighting

Min-Hui Lin, Mahesh Reddy, Guillaume Berger, Michel Sarkis, Fatih Porikli, Ning Bi

TL;DR

EdgeRelight360 tackles the challenge of privacy-preserving, real-time on-device video portrait relighting by synthesizing 360-degree HDRI environment maps conditioned on text prompts. It introduces a diffusion-based text-to-360-degree HDRI generator trained on quantized HDR data and a lightweight, edge-friendly relighting pipeline that uses a Geometry Net for normals and a light-adding renderer to apply diffuse lighting from the HDRI, all running on mobile hardware with INT8 quantization. The approach achieves real-time on-device inference (about $25$ fps end-to-end) and produces temporally stable, photorealistic relighting in dynamic video scenarios, outperforming remote HITL and prior on-device methods in both speed and visual quality. The work demonstrates practical impact for mobile video conferencing, gaming, and AR by enabling controllable lighting directly from textual descriptions while preserving user privacy. Key contributions include the end-to-end on-device HDRI generation from text, a lightweight relighting pipeline, quantization-aware training, and extensive on-device evaluation against existing HDRI-based and relighting methods.

Abstract

In this paper, we present EdgeRelight360, an approach for real-time video portrait relighting on mobile devices, utilizing text-conditioned generation of 360-degree high dynamic range image (HDRI) maps. Our method proposes a diffusion-based text-to-360-degree image generation in the HDR domain, taking advantage of the HDR10 standard. This technique facilitates the generation of high-quality, realistic lighting conditions from textual descriptions, offering flexibility and control in portrait video relighting task. Unlike the previous relighting frameworks, our proposed system performs video relighting directly on-device, enabling real-time inference with real 360-degree HDRI maps. This on-device processing ensures both privacy and guarantees low runtime, providing an immediate response to changes in lighting conditions or user inputs. Our approach paves the way for new possibilities in real-time video applications, including video conferencing, gaming, and augmented reality, by allowing dynamic, text-based control of lighting conditions.

EdgeRelight360: Text-Conditioned 360-Degree HDR Image Generation for Real-Time On-Device Video Portrait Relighting

TL;DR

fps end-to-end) and produces temporally stable, photorealistic relighting in dynamic video scenarios, outperforming remote HITL and prior on-device methods in both speed and visual quality. The work demonstrates practical impact for mobile video conferencing, gaming, and AR by enabling controllable lighting directly from textual descriptions while preserving user privacy. Key contributions include the end-to-end on-device HDRI generation from text, a lightweight relighting pipeline, quantization-aware training, and extensive on-device evaluation against existing HDRI-based and relighting methods.

Abstract

Paper Structure (21 sections, 5 equations, 16 figures, 4 tables)

This paper contains 21 sections, 5 equations, 16 figures, 4 tables.

Introduction
Related work
Text-to-image:
Portrait relighting:
Text-conditioned 360-degree HDRI map generation
Quantized HDR images
Training setup
Video portrait relighting
Geometry Net
Average Temporal Filter
Light Adding based Rendering
On-device inference
Results
Text to 360-degree HDRI map generation
Video portrait relighting
...and 6 more sections

Figures (16)

Figure 1: Our proposed method for generating 360-degree environment map from text prompt followed by video portrait relighting in real-time on mobile devices.
Figure 2: We propose to combine PQ inverse EOTF used in the HDR10 standard with 8-bit quantization to obtain the quantized HDRI maps to generate the training dataset. Similarly, dequantization and the PQ EOTF can be performed to recover the original HDRI map.
Figure 3: Encoding and decoding (a) quantized HDRI map with a pre-trained VAE leads to significant artifacts such as (b) blue patches and RGB color distortions. However, these issues can be resolved by (c) fine-tuning the VAE on quantized HDRI maps, which reproduces the quantized images close to the (d) dequantized original images.
Figure 4: To augment the perspective HDRI dataset, we generate 20 perspective HDR images for every 360-degree equirectangular HDRI map. The images are tone mapped for visualization purpose.
Figure 5: An overview of the (1) text to 360-degree training setup along with (2) the quantized HDRI generation. The generated quantized image can be (3) dequantized with inverse PQ transformation to produce the final HDRI map.
...and 11 more figures

EdgeRelight360: Text-Conditioned 360-Degree HDR Image Generation for Real-Time On-Device Video Portrait Relighting

TL;DR

Abstract

EdgeRelight360: Text-Conditioned 360-Degree HDR Image Generation for Real-Time On-Device Video Portrait Relighting

Authors

TL;DR

Abstract

Table of Contents

Figures (16)