Style-Aware Gloss Control for Generative Non-Photorealistic Rendering
Santiago Jimenez-Navarro, Belen Masia, Ana Serrano
TL;DR
This work investigates how gloss and artistic style are represented and controllable in non-photorealistic rendering by learning a hierarchical, unsupervised latent space with StyleGAN2-ADA and a pSp encoder, revealing a dedicated gloss dimension localized to layer 6 and style to layer 8 within a $W^+$ space. A 10,080-sample Stylized Gloss dataset enables disentanglement of gloss from other appearance factors across three painterly styles, revealing layer-wise factor encoding and strong linear mappings between Layer 6 representations and gloss levels. Building on this, the authors introduce a lightweight diffusion adapter that conditions a latent-diffusion model on $W^+$ embeddings to achieve fine-grained control of gloss and style, together with geometry and color through text prompts and spatial cues (ControlNet for edges, albedo maps). The approach outperforms prior NPR stylization methods in terms of disentanglement, controllability, and fidelity to reference style and gloss, demonstrated both qualitatively and via a user study, and highlights pathways for extending controllable, perceptually grounded generative tools. Practical impact lies in enabling precise, interpretable editing of NPR appearance for graphics, art, and design while contributing to the broader understanding of how perceptual factors organize in learned latent representations.
Abstract
Humans can infer material characteristics of objects from their visual appearance, and this ability extends to artistic depictions, where similar perceptual strategies guide the interpretation of paintings or drawings. Among the factors that define material appearance, gloss, along with color, is widely regarded as one of the most important, and recent studies indicate that humans can perceive gloss independently of the artistic style used to depict an object. To investigate how gloss and artistic style are represented in learned models, we train an unsupervised generative model on a newly curated dataset of painterly objects designed to systematically vary such factors. Our analysis reveals a hierarchical latent space in which gloss is disentangled from other appearance factors, allowing for a detailed study of how gloss is represented and varies across artistic styles. Building on this representation, we introduce a lightweight adapter that connects our style- and gloss-aware latent space to a latent-diffusion model, enabling the synthesis of non-photorealistic images with fine-grained control of these factors. We compare our approach with previous models and observe improved disentanglement and controllability of the learned factors.
