Table of Contents
Fetching ...

Artistic Intelligence: A Diffusion-Based Framework for High-Fidelity Landscape Painting Synthesis

Wanggong Yang, Yifei Zhao

TL;DR

This paper introduces LPGen, a novel diffusion-based model specifically designed for landscape painting generation that outperforms current state-of-the-art models in producing structurally accurate and stylistically coherent paintings.

Abstract

Generating high-fidelity landscape paintings remains a challenging task that requires precise control over both structure and style. In this paper, we present LPGen, a novel diffusion-based model specifically designed for landscape painting generation. LPGen introduces a decoupled cross-attention mechanism that independently processes structural and stylistic features, effectively mimicking the layered approach of traditional painting techniques. Additionally, LPGen proposes a structural controller, a multi-scale encoder designed to control the layout of landscape paintings, striking a balance between aesthetics and composition. Besides, the model is pre-trained on a curated dataset of high-resolution landscape images, categorized by distinct artistic styles, and then fine-tuned to ensure detailed and consistent output. Through extensive evaluations, LPGen demonstrates superior performance in producing paintings that are not only structurally accurate but also stylistically coherent, surpassing current state-of-the-art models. This work advances AI-generated art and offers new avenues for exploring the intersection of technology and traditional artistic practices. Our code, dataset, and model weights will be publicly available.

Artistic Intelligence: A Diffusion-Based Framework for High-Fidelity Landscape Painting Synthesis

TL;DR

This paper introduces LPGen, a novel diffusion-based model specifically designed for landscape painting generation that outperforms current state-of-the-art models in producing structurally accurate and stylistically coherent paintings.

Abstract

Generating high-fidelity landscape paintings remains a challenging task that requires precise control over both structure and style. In this paper, we present LPGen, a novel diffusion-based model specifically designed for landscape painting generation. LPGen introduces a decoupled cross-attention mechanism that independently processes structural and stylistic features, effectively mimicking the layered approach of traditional painting techniques. Additionally, LPGen proposes a structural controller, a multi-scale encoder designed to control the layout of landscape paintings, striking a balance between aesthetics and composition. Besides, the model is pre-trained on a curated dataset of high-resolution landscape images, categorized by distinct artistic styles, and then fine-tuned to ensure detailed and consistent output. Through extensive evaluations, LPGen demonstrates superior performance in producing paintings that are not only structurally accurate but also stylistically coherent, surpassing current state-of-the-art models. This work advances AI-generated art and offers new avenues for exploring the intersection of technology and traditional artistic practices. Our code, dataset, and model weights will be publicly available.
Paper Structure (19 sections, 11 equations, 8 figures, 2 tables)

This paper contains 19 sections, 11 equations, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Hand-drawing method compared with the proposed LPGen method. The hand-drawing method involves a complex and repetitive process with several key steps such as outlining, chapping, rubbing, moss-dotting, and coloring. LPGen simplifies the process, making it convenient and flexible by precisely generating landscape paintings from a specified style reference image and a canny outline image.
  • Figure 2: Schematic of LPGen for landscape painting generation The schematic comprises two key components: the structure controller and the style controller. The structure controller module utilizes the Decoupled Cross-Attention technique to separately manage the structural information of an image across different domains, allowing for precise control and regulation of specific elements and attributes in the generated image. The style controller dynamically adjusts the features of the input image, enabling the generative model to accurately capture and reflect the style and structure of the source image.
  • Figure 3: Experimental dataset processing workflow. The workflow begins with collecting raw image data, followed by cleaning the data to eliminate noise, duplicates, and irrelevant information. The data is then pre-processed by resizing images and converting formats. Finally, matching pairs of text and images are created for model training.
  • Figure 4: Diverse landscape paintings generated by our method. Our method, LPGen, is capable of producing artworks in various styles and creating differentiated paintings within the same style. Reference 1 through Reference 6 represent landscape paintings generated in the same style but with different canny edge maps. Canny 1 through Canny 6 depict landscape paintings generated with the same Canny edge map but using different style references.
  • Figure 5: Comparison between the proposed method and mainstream methods in generative landscape paintings. Figure 6a displays the constraint canny edge map, while Figure 6b shows the target ink-style reference image. The proposed method uses a Canny image to control the structure and a reference image to control the generated painting style. Each method generated landscape paintings using four different Canny edge maps, resulting in a total of 16 images.
  • ...and 3 more figures