Table of Contents
Fetching ...

PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models

Junwen Chen, Heyang Jiang, Yanbin Wang, Keming Wu, Ji Li, Chao Zhang, Keiji Yanai, Dong Chen, Yuhui Yuan

TL;DR

PrismLayers tackles the lack of open, high-quality multi-layer transparent data by introducing PrismLayers and PrismLayersPro, alongside a training-free synthesis pipeline (LayerFLUX, MultiLayerFLUX) and an open baseline model (ART+). By generating and filtering high-quality layered data and fine-tuning ART on PrismLayersPro, the work demonstrates improved layer quality, coherence, and style control, enabling editable, multi-layer imagery for design workflows. A dedicated quality metric, Transparent Image Preference Score (TIPS), underpins data curation and evaluation. The open datasets, tooling, and strong baseline provide a solid foundation for future research in precise, editable multi-layer transparent image generation.

Abstract

Generating high-quality, multi-layer transparent images from text prompts can unlock a new level of creative control, allowing users to edit each layer as effortlessly as editing text outputs from LLMs. However, the development of multi-layer generative models lags behind that of conventional text-to-image models due to the absence of a large, high-quality corpus of multi-layer transparent data. In this paper, we address this fundamental challenge by: (i) releasing the first open, ultra-high-fidelity PrismLayers (PrismLayersPro) dataset of 200K (20K) multilayer transparent images with accurate alpha mattes, (ii) introducing a trainingfree synthesis pipeline that generates such data on demand using off-the-shelf diffusion models, and (iii) delivering a strong, open-source multi-layer generation model, ART+, which matches the aesthetics of modern text-to-image generation models. The key technical contributions include: LayerFLUX, which excels at generating high-quality single transparent layers with accurate alpha mattes, and MultiLayerFLUX, which composes multiple LayerFLUX outputs into complete images, guided by human-annotated semantic layout. To ensure higher quality, we apply a rigorous filtering stage to remove artifacts and semantic mismatches, followed by human selection. Fine-tuning the state-of-the-art ART model on our synthetic PrismLayersPro yields ART+, which outperforms the original ART in 60% of head-to-head user study comparisons and even matches the visual quality of images generated by the FLUX.1-[dev] model. We anticipate that our work will establish a solid dataset foundation for the multi-layer transparent image generation task, enabling research and applications that require precise, editable, and visually compelling layered imagery.

PrismLayers: Open Data for High-Quality Multi-Layer Transparent Image Generative Models

TL;DR

PrismLayers tackles the lack of open, high-quality multi-layer transparent data by introducing PrismLayers and PrismLayersPro, alongside a training-free synthesis pipeline (LayerFLUX, MultiLayerFLUX) and an open baseline model (ART+). By generating and filtering high-quality layered data and fine-tuning ART on PrismLayersPro, the work demonstrates improved layer quality, coherence, and style control, enabling editable, multi-layer imagery for design workflows. A dedicated quality metric, Transparent Image Preference Score (TIPS), underpins data curation and evaluation. The open datasets, tooling, and strong baseline provide a solid foundation for future research in precise, editable multi-layer transparent image generation.

Abstract

Generating high-quality, multi-layer transparent images from text prompts can unlock a new level of creative control, allowing users to edit each layer as effortlessly as editing text outputs from LLMs. However, the development of multi-layer generative models lags behind that of conventional text-to-image models due to the absence of a large, high-quality corpus of multi-layer transparent data. In this paper, we address this fundamental challenge by: (i) releasing the first open, ultra-high-fidelity PrismLayers (PrismLayersPro) dataset of 200K (20K) multilayer transparent images with accurate alpha mattes, (ii) introducing a trainingfree synthesis pipeline that generates such data on demand using off-the-shelf diffusion models, and (iii) delivering a strong, open-source multi-layer generation model, ART+, which matches the aesthetics of modern text-to-image generation models. The key technical contributions include: LayerFLUX, which excels at generating high-quality single transparent layers with accurate alpha mattes, and MultiLayerFLUX, which composes multiple LayerFLUX outputs into complete images, guided by human-annotated semantic layout. To ensure higher quality, we apply a rigorous filtering stage to remove artifacts and semantic mismatches, followed by human selection. Fine-tuning the state-of-the-art ART model on our synthetic PrismLayersPro yields ART+, which outperforms the original ART in 60% of head-to-head user study comparisons and even matches the visual quality of images generated by the FLUX.1-[dev] model. We anticipate that our work will establish a solid dataset foundation for the multi-layer transparent image generation task, enabling research and applications that require precise, editable, and visually compelling layered imagery.

Paper Structure

This paper contains 11 sections, 3 equations, 17 figures, 10 tables.

Figures (17)

  • Figure 1: Illustration of key statistics from PrismLayers (number of layers) and PrismLayersPro (different of styles), along with representative high-quality synthetic multi-layer transparent images from PrismLayersPro.
  • Figure 2: User study results on the effectiveness of PrismLayersPro. Left: ART+ v.s. ART. Right: ART+ v.s. MultiLayerFLUX. With fine-tuning on PrismLayersPro, ART+ achieves the best performance.
  • Figure 3: Illustrating the key dataset statistics on PrismLayers and PrismLayersPro
  • Figure 4: Illustrating the aesthetic quality of the crawled data (columns 1 and 4), synthetic data (columns 2 and 5), and high-quality synthetic data generated with a style prompt (columns 3 and 6).
  • Figure 5: Dataset Curation Pipeline of PrismLayers and PrismLayersPro. We first extract semantic layouts from a database of 800K crawled multi-layer graphic design images. Then, we apply MultiLayerFLUX to generate high-quality multi-layer transparent images. An Artifact Classifier is used to evaluate the quality of each composed image, discarding low-quality results to construct PrismLayers. We also apply the Transparent Image Preference Score (TIPS) model to assess the quality of individual transparent layers. By filtering for aesthetic quality and balancing the number of layers, we collect an 80K-image reference layout pool. From this pool, we sample 20K of the highest-quality layouts and regenerate them with style prompts, followed by manual selection—forming our released open-source, high-quality multi-layer dataset, PrismLayersPro.
  • ...and 12 more figures