Text2Relight: Creative Portrait Relighting with Text Guidance

Junuk Cha; Mengwei Ren; Krishna Kumar Singh; He Zhang; Yannick Hold-Geoffroy; Seunghyun Yoon; HyunJoon Jung; Jae Shin Yoon; Seungryul Baek

Text2Relight: Creative Portrait Relighting with Text Guidance

Junuk Cha, Mengwei Ren, Krishna Kumar Singh, He Zhang, Yannick Hold-Geoffroy, Seunghyun Yoon, HyunJoon Jung, Jae Shin Yoon, Seungryul Baek

TL;DR

Text2Relight addresses the challenge of text-driven portrait relighting by proposing a scalable data synthesis pipeline and a lighting-focused foundational diffusion model. It combines hierarchical text prompts generated by large language models, text-conditioned lighting image generation (RGB and HDR panorama), and image-based relighting for both foreground and background using a point-light representation and inverse rendering. The model is trained by repurposing InstructPix2Pix with auxiliary tasks (shadow removal and light positioning) and a targeted loss to align generated lighting with text prompts, achieving superior fidelity and content preservation compared to baselines. This approach enables creative, text-guided relighting of in-the-wild portraits and broad applications in portrait editing, though it notes limitations in background lighting realism and spatial context understanding.

Abstract

We present a lighting-aware image editing pipeline that, given a portrait image and a text prompt, performs single image relighting. Our model modifies the lighting and color of both the foreground and background to align with the provided text description. The unbounded nature in creativeness of a text allows us to describe the lighting of a scene with any sensory features including temperature, emotion, smell, time, and so on. However, the modeling of such mapping between the unbounded text and lighting is extremely challenging due to the lack of dataset where there exists no scalable data that provides large pairs of text and relighting, and therefore, current text-driven image editing models does not generalize to lighting-specific use cases. We overcome this problem by introducing a novel data synthesis pipeline: First, diverse and creative text prompts that describe the scenes with various lighting are automatically generated under a crafted hierarchy using a large language model (*e.g.,* ChatGPT). A text-guided image generation model creates a lighting image that best matches the text. As a condition of the lighting images, we perform image-based relighting for both foreground and background using a single portrait image or a set of OLAT (One-Light-at-A-Time) images captured from lightstage system. Particularly for the background relighting, we represent the lighting image as a set of point lights and transfer them to other background images. A generative diffusion model learns the synthesized large-scale data with auxiliary task augmentation (*e.g.,* portrait delighting and light positioning) to correlate the latent text and lighting distribution for text-guided portrait relighting.

Text2Relight: Creative Portrait Relighting with Text Guidance

TL;DR

Abstract

Text2Relight: Creative Portrait Relighting with Text Guidance

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (30)