Style-Aware Gloss Control for Generative Non-Photorealistic Rendering

Santiago Jimenez-Navarro; Belen Masia; Ana Serrano

Style-Aware Gloss Control for Generative Non-Photorealistic Rendering

Santiago Jimenez-Navarro, Belen Masia, Ana Serrano

TL;DR

This work investigates how gloss and artistic style are represented and controllable in non-photorealistic rendering by learning a hierarchical, unsupervised latent space with StyleGAN2-ADA and a pSp encoder, revealing a dedicated gloss dimension localized to layer 6 and style to layer 8 within a $W^+$ space. A 10,080-sample Stylized Gloss dataset enables disentanglement of gloss from other appearance factors across three painterly styles, revealing layer-wise factor encoding and strong linear mappings between Layer 6 representations and gloss levels. Building on this, the authors introduce a lightweight diffusion adapter that conditions a latent-diffusion model on $W^+$ embeddings to achieve fine-grained control of gloss and style, together with geometry and color through text prompts and spatial cues (ControlNet for edges, albedo maps). The approach outperforms prior NPR stylization methods in terms of disentanglement, controllability, and fidelity to reference style and gloss, demonstrated both qualitatively and via a user study, and highlights pathways for extending controllable, perceptually grounded generative tools. Practical impact lies in enabling precise, interpretable editing of NPR appearance for graphics, art, and design while contributing to the broader understanding of how perceptual factors organize in learned latent representations.

Abstract

Humans can infer material characteristics of objects from their visual appearance, and this ability extends to artistic depictions, where similar perceptual strategies guide the interpretation of paintings or drawings. Among the factors that define material appearance, gloss, along with color, is widely regarded as one of the most important, and recent studies indicate that humans can perceive gloss independently of the artistic style used to depict an object. To investigate how gloss and artistic style are represented in learned models, we train an unsupervised generative model on a newly curated dataset of painterly objects designed to systematically vary such factors. Our analysis reveals a hierarchical latent space in which gloss is disentangled from other appearance factors, allowing for a detailed study of how gloss is represented and varies across artistic styles. Building on this representation, we introduce a lightweight adapter that connects our style- and gloss-aware latent space to a latent-diffusion model, enabling the synthesis of non-photorealistic images with fine-grained control of these factors. We compare our approach with previous models and observe improved disentanglement and controllability of the learned factors.

Style-Aware Gloss Control for Generative Non-Photorealistic Rendering

TL;DR

space. A 10,080-sample Stylized Gloss dataset enables disentanglement of gloss from other appearance factors across three painterly styles, revealing layer-wise factor encoding and strong linear mappings between Layer 6 representations and gloss levels. Building on this, the authors introduce a lightweight diffusion adapter that conditions a latent-diffusion model on

embeddings to achieve fine-grained control of gloss and style, together with geometry and color through text prompts and spatial cues (ControlNet for edges, albedo maps). The approach outperforms prior NPR stylization methods in terms of disentanglement, controllability, and fidelity to reference style and gloss, demonstrated both qualitatively and via a user study, and highlights pathways for extending controllable, perceptually grounded generative tools. Practical impact lies in enabling precise, interpretable editing of NPR appearance for graphics, art, and design while contributing to the broader understanding of how perceptual factors organize in learned latent representations.

Abstract

Paper Structure (31 sections, 3 equations, 24 figures, 2 tables)

This paper contains 31 sections, 3 equations, 24 figures, 2 tables.

Introduction
Related Work
Image Stylization in Non-photorealistic Rendering
Perception in Non-photorealistic Rendering
Generative Models and Human Perception
Disentanglement of Style and Gloss in W+ space
Preliminaries
A Dataset for Stylized Gloss
Architecture
Analysis of the Model
Reconstruction Capabilities of the Pipeline
Internal Organization of the Latent Space
Embedding of Categorical Factors
Embedding of Gloss
Ablation on the Number of Styles
...and 16 more sections

Figures (24)

Figure 1: Left: We train a pSp–StyleGAN2 pipeline that learns a 16-layer latent space where appearance factors such as gloss and artistic style emerge in a disentangled and hierarchical manner. Specific layers specialize in different attributes, e.g., Layer 6 captures gloss and Layer 8 captures style. Right: This learned space enables intuitive control of appearance. Given a reference geometry (edges) and optional albedo map, the user can employ our diffusion-based pipeline to transfer the gloss and style of an input drawing (inset) to new objects, or traverse the gloss dimension to obtain predictable variations from matte to glossy while keeping other factors stable.
Figure 2: Example of brushstroke map extraction for a matte sphere painted in charcoal style. Artist-painted reference (left), corresponding photorealistic rendering (middle), and the estimated brushstroke map extracted from the pair (right).
Figure 3: Application of the brushstroke map s to generate controlled style-guidance samples. From top to bottom rows, we show 1) the photorealistic reference spheres with gradually varying roughness; 2) the stylized spheres using the estimated brushstroke map s; 3) the result of applying StyLit to the reference geometry g; and 4) the corresponding samples from the original Subias dataset subias2025artistinator. Note the more controlled and continuous increment of gloss of our samples when compared with the reference.
Figure 4: Diagram of the architecture composed by the pSp encoder that constructs a layer-wise latent space through map2style layers, and the StyleGAN2 generator, which synthesizes images conditioned on these layer-wise latent representations.
Figure 5: Traversals at different depths of the latent space produce interpretable changes in appearance. Starting from the embeddings of a source image (left), moving along early, intermediate, and late layers respectively induces variations in geometry and illumination, gloss and style, or color.
...and 19 more figures

Style-Aware Gloss Control for Generative Non-Photorealistic Rendering

TL;DR

Abstract

Style-Aware Gloss Control for Generative Non-Photorealistic Rendering

Authors

TL;DR

Abstract

Table of Contents

Figures (24)