Table of Contents
Fetching ...

Data-Efficient Brushstroke Generation with Diffusion Models for Oil Painting

Dantong Qin, Alessandro Bozzon, Xian Yang, Xun Zhang, Yike Guo, Pan Wang

TL;DR

This work studies the problem of learning human-like brushstroke generation from a small set of hand-drawn samples and proposes StrokeDiff, a diffusion-based framework with Smooth Regularization (SmR), providing a simple mechanism to stabilize diffusion models under sparse supervision without altering the inference process.

Abstract

Many creative multimedia systems are built upon visual primitives such as strokes or textures, which are difficult to collect at scale and fundamentally different from natural image data. This data scarcity makes it challenging for modern generative models to learn expressive and controllable primitives, limiting their use in process-aware content creation. In this work, we study the problem of learning human-like brushstroke generation from a small set of hand-drawn samples (n=470) and propose StrokeDiff, a diffusion-based framework with Smooth Regularization (SmR). SmR injects stochastic visual priors during training, providing a simple mechanism to stabilize diffusion models under sparse supervision without altering the inference process. We further show how the learned primitives can be made controllable through a Bézier-based conditioning module and integrated into a complete stroke-based painting pipeline, including prediction, generation, ordering, and compositing. This demonstrates how data-efficient primitive modeling can support expressive and structured multimedia content creation. Experiments indicate that the proposed approach produces diverse and structurally coherent brushstrokes and enables paintings with richer texture and layering, validated by both automatic metrics and human evaluation.

Data-Efficient Brushstroke Generation with Diffusion Models for Oil Painting

TL;DR

This work studies the problem of learning human-like brushstroke generation from a small set of hand-drawn samples and proposes StrokeDiff, a diffusion-based framework with Smooth Regularization (SmR), providing a simple mechanism to stabilize diffusion models under sparse supervision without altering the inference process.

Abstract

Many creative multimedia systems are built upon visual primitives such as strokes or textures, which are difficult to collect at scale and fundamentally different from natural image data. This data scarcity makes it challenging for modern generative models to learn expressive and controllable primitives, limiting their use in process-aware content creation. In this work, we study the problem of learning human-like brushstroke generation from a small set of hand-drawn samples (n=470) and propose StrokeDiff, a diffusion-based framework with Smooth Regularization (SmR). SmR injects stochastic visual priors during training, providing a simple mechanism to stabilize diffusion models under sparse supervision without altering the inference process. We further show how the learned primitives can be made controllable through a Bézier-based conditioning module and integrated into a complete stroke-based painting pipeline, including prediction, generation, ordering, and compositing. This demonstrates how data-efficient primitive modeling can support expressive and structured multimedia content creation. Experiments indicate that the proposed approach produces diverse and structurally coherent brushstrokes and enables paintings with richer texture and layering, validated by both automatic metrics and human evaluation.
Paper Structure (39 sections, 31 equations, 16 figures, 7 tables)

This paper contains 39 sections, 31 equations, 16 figures, 7 tables.

Figures (16)

  • Figure 1: Comparison of stroke quality and painting outputs. (a) Prior SBR methods use geometric or template strokes with limited expressive capacity. Our strokes, learned from human-painted data, yield richer structure and texture. (b) Target oil painting style, showing abstraction beyond realism. (c) Compared to Huang et al. huang2019learning, Our method introduces a stronger artistic domain shift.
  • Figure 2: Overview of our method. (A) Smooth Regularization injects priors during training to mitigate mode collapse under limited data. At each timestep, a brushstroke $x_s$ is sampled from the dataset and injected into the forward process alongside a newly sampled noise term $\epsilon^* \sim \mathcal{N}(0, I)$; both terms are scaled by a factor $\eta \sim \mathrm{Uni}[0, \Upsilon]$. (B) The StrokeDiff framework is trained in two stages. In Stage 1, the model learns to generate strokes with SmR but without conditional inputs. In Stage 2, we incorporate parameter conditioning via a raster-to-vector module that predicts vectorized stroke parameters (e.g., control points, opacity, RGB, width). These parameters are then used to control the generation of specific brushstroke attributes.
  • Figure 3: Stroke generation using Stable Diffusion v1-5 rombach2022high fine-tuned on typical stroke data that lacks sufficient diversity. Due to severe mode collapse early in training, the model produces only texture-like images with minimal variation and no authentic brushstroke resemblance.
  • Figure 4: Illustration of the painting pipeline. Stroke parameters are reordered according to the predicted ranking scores $src_r$ to ensure a coherent rendering sequence.
  • Figure 5: Strokes that are only optimally matched render an unordered sequence, leading to incorrect overlapping.
  • ...and 11 more figures