LumiCtrl : Learning Illuminant Prompts for Lighting Control in Personalized Text-to-Image Models
Muhammad Atif Butt, Kai Wang, Javier Vazquez-Corral, Joost Van De Weijer
TL;DR
LumiCtrl addresses the lack of explicit illuminant control in text-to-image diffusion by learning illuminant prompts from a single image. It integrates physics-based Planckian augmentation, edge-guided prompt disentanglement via a frozen ControlNet, and a foreground-focused masked reconstruction loss to achieve contextual light adaptation. The approach diagnoses a semantic gap in illuminant grounding within text encoders and demonstrates superior illuminant fidelity, aesthetic quality, and scene coherence compared with prior personalization and editing methods, backed by a user study. This work enables precise, content-preserving lighting control in personalized T2I generation, with potential for broader applications in design and visual storytelling.
Abstract
Current text-to-image (T2I) models have demonstrated remarkable progress in creative image generation, yet they still lack precise control over scene illuminants, which is a crucial factor for content designers aiming to manipulate the mood, atmosphere, and visual aesthetics of generated images. In this paper, we present an illuminant personalization method named LumiCtrl that learns an illuminant prompt given a single image of an object. LumiCtrl consists of three basic components: given an image of the object, our method applies (a) physics-based illuminant augmentation along the Planckian locus to create fine-tuning variants under standard illuminants; (b) edge-guided prompt disentanglement using a frozen ControlNet to ensure prompts focus on illumination rather than structure; and (c) a masked reconstruction loss that focuses learning on the foreground object while allowing the background to adapt contextually, enabling what we call contextual light adaptation. We qualitatively and quantitatively compare LumiCtrl against other T2I customization methods. The results show that our method achieves significantly better illuminant fidelity, aesthetic quality, and scene coherence compared to existing personalization baselines. A human preference study further confirms strong user preference for LumiCtrl outputs. The code and data will be released upon publication.
