Table of Contents
Fetching ...

PaintFlow: A Unified Framework for Interactive Oil Paintings Editing and Generation

Zhangli Hu, Ye Chen, Jiajun Yao, Bingbing Ni

TL;DR

PaintFlow tackles the challenge of interactive oil painting generation and editing by introducing a unified diffusion-based framework that accepts text, sketches, and reference images to control content and maintain a consistent oil painting style. It establishes a three-pronged technical approach: (1) a training-time conditioning strategy that aligns spatial and semantic information, (2) a self-supervised Stroke-Based Rendering pipeline to synthesize large oil-painting datasets from real images, and (3) a style-retention mechanism using AdaIN during inference to preserve brushstroke aesthetics. The model demonstrates superior multimodal instruction alignment and style preservation through extensive experiments and ablation studies, enabling fine-grained region-aware editing and generation from blank canvases with interactive feedback. Overall, PaintFlow offers a practical, high-fidelity pipeline for artists and designers to create and edit stylized oil paintings with strong cross-modal control and stylistic consistency.

Abstract

Oil painting, as a high-level medium that blends human abstract thinking with artistic expression, poses substantial challenges for digital generation and editing due to its intricate brushstroke dynamics and stylized characteristics. Existing generation and editing techniques are often constrained by the distribution of training data and primarily focus on modifying real photographs. In this work, we introduce a unified multimodal framework for oil painting generation and editing. The proposed system allows users to incorporate reference images for precise semantic control, hand-drawn sketches for spatial structure alignment, and natural language prompts for high-level semantic guidance, while consistently maintaining a unified painting style across all outputs. Our method achieves interactive oil painting creation through three crucial technical advancements. First, we enhance the training stage with spatial alignment and semantic enhancement conditioning strategy, which map masks and sketches into spatial constraints, and encode contextual embedding from reference images and text into feature constraints, enabling object-level semantic alignment. Second, to overcome data scarcity, we propose a self-supervised style transfer pipeline based on Stroke-Based Rendering (SBR), which simulates the inpainting dynamics of oil painting restoration, converting real images into stylized oil paintings with preserved brushstroke textures to construct a large-scale paired training dataset. Finally, during inference, we integrate features using the AdaIN operator to ensure stylistic consistency. Extensive experiments demonstrate that our interactive system enables fine-grained editing while preserving the artistic qualities of oil paintings, achieving an unprecedented level of imagination realization in stylized oil paintings generation and editing.

PaintFlow: A Unified Framework for Interactive Oil Paintings Editing and Generation

TL;DR

PaintFlow tackles the challenge of interactive oil painting generation and editing by introducing a unified diffusion-based framework that accepts text, sketches, and reference images to control content and maintain a consistent oil painting style. It establishes a three-pronged technical approach: (1) a training-time conditioning strategy that aligns spatial and semantic information, (2) a self-supervised Stroke-Based Rendering pipeline to synthesize large oil-painting datasets from real images, and (3) a style-retention mechanism using AdaIN during inference to preserve brushstroke aesthetics. The model demonstrates superior multimodal instruction alignment and style preservation through extensive experiments and ablation studies, enabling fine-grained region-aware editing and generation from blank canvases with interactive feedback. Overall, PaintFlow offers a practical, high-fidelity pipeline for artists and designers to create and edit stylized oil paintings with strong cross-modal control and stylistic consistency.

Abstract

Oil painting, as a high-level medium that blends human abstract thinking with artistic expression, poses substantial challenges for digital generation and editing due to its intricate brushstroke dynamics and stylized characteristics. Existing generation and editing techniques are often constrained by the distribution of training data and primarily focus on modifying real photographs. In this work, we introduce a unified multimodal framework for oil painting generation and editing. The proposed system allows users to incorporate reference images for precise semantic control, hand-drawn sketches for spatial structure alignment, and natural language prompts for high-level semantic guidance, while consistently maintaining a unified painting style across all outputs. Our method achieves interactive oil painting creation through three crucial technical advancements. First, we enhance the training stage with spatial alignment and semantic enhancement conditioning strategy, which map masks and sketches into spatial constraints, and encode contextual embedding from reference images and text into feature constraints, enabling object-level semantic alignment. Second, to overcome data scarcity, we propose a self-supervised style transfer pipeline based on Stroke-Based Rendering (SBR), which simulates the inpainting dynamics of oil painting restoration, converting real images into stylized oil paintings with preserved brushstroke textures to construct a large-scale paired training dataset. Finally, during inference, we integrate features using the AdaIN operator to ensure stylistic consistency. Extensive experiments demonstrate that our interactive system enables fine-grained editing while preserving the artistic qualities of oil paintings, achieving an unprecedented level of imagination realization in stylized oil paintings generation and editing.

Paper Structure

This paper contains 19 sections, 10 equations, 10 figures, 2 tables, 2 algorithms.

Figures (10)

  • Figure 1: PaintFlow is an interactive oil painting editing and generation system. It enables users to flexibly combine text, reference images, and sketches for fine-grained control. Please zoom in to capture more details.
  • Figure 2: Overview of our framework. During training, we follow a condition alignment paradigm by feeding the mask and sketch as additional channels into the denoising process. A semantic enhancement strategy is applied to extract fine-grained features as context embedding via cross-attention from the reference image and frozen pretrained text embedding, ensuring detailed semantic fidelity. During inference, we use a training-free AdaIN operator to align source style. Prompt features from the frozen CLIP text encoder are fused with a learnable hyperparameter $\lambda$, enabling visually satisfactory editing.
  • Figure 3: Qualitative comparison with state-of-the-art methods. We showcase the inpainting results of previous methods and ours. As shown, the results of existing methods are plagued with artifacts, irregular edges, and realistic style inconsistencies. Our method achieves significantly better oil paintings than other methods. Please zoom in to capture more details.
  • Figure 4: Flow Painting Process. We demonstrate how our system generates a high-fidelity oil painting in four steps.
  • Figure 5: Ablation Study. Our method is equipped with a dedicated module for each condition, and the absence of any modality will compromise the generated image fidelity.
  • ...and 5 more figures