Table of Contents
Fetching ...

AttentionPainter: An Efficient and Adaptive Stroke Predictor for Scene Painting

Yizhe Tang, Yue Wang, Teng Hu, Ran Yi, Xin Tan, Lizhuang Ma, Yu-Kun Lai, Paul L. Rosin

TL;DR

This work proposes AttentionPainter, an efficient and adaptive model for single-step neural painting, which outperforms the state-of-the-art neural painting methods and proposes a novel scalable stroke predictor, which predicts a large number of stroke parameters within a single forward process.

Abstract

Stroke-based Rendering (SBR) aims to decompose an input image into a sequence of parameterized strokes, which can be rendered into a painting that resembles the input image. Recently, Neural Painting methods that utilize deep learning and reinforcement learning models to predict the stroke sequences have been developed, but suffer from longer inference time or unstable training. To address these issues, we propose AttentionPainter, an efficient and adaptive model for single-step neural painting. First, we propose a novel scalable stroke predictor, which predicts a large number of stroke parameters within a single forward process, instead of the iterative prediction of previous Reinforcement Learning or auto-regressive methods, which makes AttentionPainter faster than previous neural painting methods. To further increase the training efficiency, we propose a Fast Stroke Stacking algorithm, which brings 13 times acceleration for training. Moreover, we propose Stroke-density Loss, which encourages the model to use small strokes for detailed information, to help improve the reconstruction quality. Finally, we propose a new stroke diffusion model for both conditional and unconditional stroke-based generation, which denoises in the stroke parameter space and facilitates stroke-based inpainting and editing applications helpful for human artists design. Extensive experiments show that AttentionPainter outperforms the state-of-the-art neural painting methods.

AttentionPainter: An Efficient and Adaptive Stroke Predictor for Scene Painting

TL;DR

This work proposes AttentionPainter, an efficient and adaptive model for single-step neural painting, which outperforms the state-of-the-art neural painting methods and proposes a novel scalable stroke predictor, which predicts a large number of stroke parameters within a single forward process.

Abstract

Stroke-based Rendering (SBR) aims to decompose an input image into a sequence of parameterized strokes, which can be rendered into a painting that resembles the input image. Recently, Neural Painting methods that utilize deep learning and reinforcement learning models to predict the stroke sequences have been developed, but suffer from longer inference time or unstable training. To address these issues, we propose AttentionPainter, an efficient and adaptive model for single-step neural painting. First, we propose a novel scalable stroke predictor, which predicts a large number of stroke parameters within a single forward process, instead of the iterative prediction of previous Reinforcement Learning or auto-regressive methods, which makes AttentionPainter faster than previous neural painting methods. To further increase the training efficiency, we propose a Fast Stroke Stacking algorithm, which brings 13 times acceleration for training. Moreover, we propose Stroke-density Loss, which encourages the model to use small strokes for detailed information, to help improve the reconstruction quality. Finally, we propose a new stroke diffusion model for both conditional and unconditional stroke-based generation, which denoises in the stroke parameter space and facilitates stroke-based inpainting and editing applications helpful for human artists design. Extensive experiments show that AttentionPainter outperforms the state-of-the-art neural painting methods.

Paper Structure

This paper contains 32 sections, 12 equations, 9 figures, 8 tables.

Figures (9)

  • Figure 1: Stroke-based Rendering (SBR) process and comparison between different methods. (a) SBR aims to recreate an image with a sequence of strokes. (b) Optimization-based methods optimize a sequence of stroke parameters to reconstruct the input image, which requires a separate optimization for each image. (c) Reinforcement Learning (RL)/Auto-regressive methods train an agent to predict a small number ($k\textless10$) of strokes at each step and iterative to obtain the final sequence. (d) Our AttentionPainter predicts a large number of strokes ($m\textgreater100$) within a single forward step, and is faster than the other methods during inference.
  • Figure 2: Neural stroke renderer and stroke design. Neural Stroke Renderer is used to simulate the Texture Renderer (which performs geometric transformations directly on texture, but is not differentiable). We use Oil strokes and Bézier curve strokes in this paper.
  • Figure 3: AttentionPainter Architecture. Given an image $\mathbf{I}$, 1) the Stroke Predictor predicts a large number of strokes in a single forward, which first extracts features, and then predicts stroke parameter sequence based on cross-attention and self-attention blocks. 2) With the predicted stroke parameters, the Stroke Renderer renders the stroke frame for each stroke. 3) Finally, the Fast Stroke Stacking (FSS) module simplifies the stroke stacking process by selecting the top $k$ strokes for each pixel to stack, and creates the final rendering. AttentionPainter is trained with pixel-wise loss and a newly proposed stroke-density loss.
  • Figure 4: An example of FSS calculation process. For better illustration, here we set the stroke number $N$ as $3$ (typically it is much larger, e.g.,$256$), and the top-$k$ as top-$2$, and we mark the strokes with different colors to distinguish between each other.
  • Figure 5: Stroke Diffusion Model (SDM) conducts diffusion and the denoising process in the stroke parameter space, where the denoising stage uses an attention-based network (8 cross-attention blocks and 8 self-attention blocks). The proposed stroke predictor in AttentionPainter is used to obtain the stroke parameters from images, and the denoised stroke parameters are decoded to the output image by the Neural Renderer and our FSS module.
  • ...and 4 more figures