Table of Contents
Fetching ...

Parametric Shadow Control for Portrait Generation in Text-to-Image Diffusion Models

Haoming Cai, Tsung-Wei Huang, Shiv Gehlot, Brandon Y. Feng, Sachin Shah, Guan-Ming Su, Christopher Metzler

TL;DR

This work introduces Shadow Director, a diffusion-model–based framework that achieves intuitive, parametric shadow control during portrait generation without relying on costly real-world light-stage data. It leverages two compact estimators—the Shadow-Depth Estimator and the Identity Estimator—operating on UNet latent features and learns from a small synthetic dataset to reveal and manipulate hidden shadow information. Shadow control is enacted through test-time optimization of latent features at early denoising steps, guided by a shadow target and an identity reference, ensuring realistic shadows while preserving subject identity across diverse artistic styles. The results show effective shadow strength, placement, and lighting-direction control with strong identity preservation, highlighting a practical, resource-efficient path for shadow manipulation in diffusion models.

Abstract

Text-to-image diffusion models excel at generating diverse portraits, but lack intuitive shadow control. Existing editing approaches, as post-processing, struggle to offer effective manipulation across diverse styles. Additionally, these methods either rely on expensive real-world light-stage data collection or require extensive computational resources for training. To address these limitations, we introduce Shadow Director, a method that extracts and manipulates hidden shadow attributes within well-trained diffusion models. Our approach uses a small estimation network that requires only a few thousand synthetic images and hours of training-no costly real-world light-stage data needed. Shadow Director enables parametric and intuitive control over shadow shape, placement, and intensity during portrait generation while preserving artistic integrity and identity across diverse styles. Despite training only on synthetic data built on real-world identities, it generalizes effectively to generated portraits with diverse styles, making it a more accessible and resource-friendly solution.

Parametric Shadow Control for Portrait Generation in Text-to-Image Diffusion Models

TL;DR

This work introduces Shadow Director, a diffusion-model–based framework that achieves intuitive, parametric shadow control during portrait generation without relying on costly real-world light-stage data. It leverages two compact estimators—the Shadow-Depth Estimator and the Identity Estimator—operating on UNet latent features and learns from a small synthetic dataset to reveal and manipulate hidden shadow information. Shadow control is enacted through test-time optimization of latent features at early denoising steps, guided by a shadow target and an identity reference, ensuring realistic shadows while preserving subject identity across diverse artistic styles. The results show effective shadow strength, placement, and lighting-direction control with strong identity preservation, highlighting a practical, resource-efficient path for shadow manipulation in diffusion models.

Abstract

Text-to-image diffusion models excel at generating diverse portraits, but lack intuitive shadow control. Existing editing approaches, as post-processing, struggle to offer effective manipulation across diverse styles. Additionally, these methods either rely on expensive real-world light-stage data collection or require extensive computational resources for training. To address these limitations, we introduce Shadow Director, a method that extracts and manipulates hidden shadow attributes within well-trained diffusion models. Our approach uses a small estimation network that requires only a few thousand synthetic images and hours of training-no costly real-world light-stage data needed. Shadow Director enables parametric and intuitive control over shadow shape, placement, and intensity during portrait generation while preserving artistic integrity and identity across diverse styles. Despite training only on synthetic data built on real-world identities, it generalizes effectively to generated portraits with diverse styles, making it a more accessible and resource-friendly solution.

Paper Structure

This paper contains 38 sections, 24 figures, 5 tables.

Figures (24)

  • Figure 1: Method Overview During Image Generation. Our approach consists of three main components: (1) User Interface (left), which provides intuitive controls for shadow strength, directional light position, and shadow shape; (2) Diffusion Model with Shadow Control (middle), where latent features are optimized at selected denoising steps; and (3) Shadow Director (right), which extracts shadow and identity information from UNet internal features using two estimators. Shadow Director is trained to infer these attributes from noisy feature maps. During generation, shadow control is achieved through test-time optimization of the latent features (marked with a fire symbol) at early denoising step for larger degree of freedom on shadow manipulation. Before optimization begins, both estimators perform an initial forward pass (dashed lines) to obtain the customized shadow and reference identity embedding. The shadow acquisition process is detailed in Fig. \ref{['fig:method_shadow_acquisition']}. During optimization, latent features are guided to match the customized shadow while maintaining identity consistency. Notably, the only optimizable component in the pipeline is the latent feature at the selected denoising step. Further architectural details of the estimators and feature extraction are provided in Appendix \ref{['supp_sec:network_architectures']}.
  • Figure 2: Shadow Acquisition. Two options for customizing shadow maps, which occurs once before latent optimization begins. (a) Shadow placement and shape: A user-defined binary mask is applied to the estimated shadow map. Masked regions become darker in the customized shadow, explicitly defining shadow areas. (b) Shadow synthesis through directional lighting: Using the estimated depth map and user-specified directional light position, we implement ray casting to generate geometrically consistent shadows. Users select only one of these two methods. Detail of ray casting is presented in Appendix. \ref{['supp_subsec:shadow_acquisition']}
  • Figure 3: Synthetic Training Dataset. Samples from our synthetic dataset, consisting of paired relit images (top row), shadow maps (middle row), and depth maps (bottom row). Unlike IC-Light requiring 10 million samples, our approach needs only a few thousand paired synthetic examples. These are generated using GeomConsistentFR hou2022face. Despite limited photorealism in shadow, this dataset proves sufficient for accessing shadow information embedded within diffusion models. This demonstrates a promising research direction: reducing dependency on expensive light-stage data while focusing on revealing lighting information already hidden in diffusion models.
  • Figure 4: Shadow intensity control on generated portraits across diverse styles. The top two rows demonstrate gradual shadow intensity control, while the bottom four rows highlight strong and weak shadow variations for better visual comparison. Shadow Director enables parametric control over shadow strength, ranging from weak to strong, while preserving both identity and artistic integrity.
  • Figure 5: Shadow shape control with user-defined masks on diverse portrait images. Shadow Director enables precise control over shadow shapes and placement using user-defined masks (shown as gray overlays).
  • ...and 19 more figures