Table of Contents
Fetching ...

OmniPSD: Layered PSD Generation with Diffusion Transformer

Cheng Liu, Yiren Song, Haofan Wang, Mike Zheng Shou

TL;DR

OmniPSD presents a unified diffusion-transformer framework for layered PSD generation and decomposition with explicit alpha-channel handling. It introduces a shared RGBA-VAE latent space and two task-specific branches (text-to-PSD and image-to-PSD) within the Flux ecosystem, enabling in-context learning and iterative editing. A large Layered Poster Dataset supports training and evaluation, and extensive experiments show high fidelity, structural coherence, and accurate transparency in editable PSD outputs. The work establishes a new paradigm for design-aware, layered graphic generation and reconstruction using diffusion transformers.

Abstract

Recent advances in diffusion models have greatly improved image generation and editing, yet generating or reconstructing layered PSD files with transparent alpha channels remains highly challenging. We propose OmniPSD, a unified diffusion framework built upon the Flux ecosystem that enables both text-to-PSD generation and image-to-PSD decomposition through in-context learning. For text-to-PSD generation, OmniPSD arranges multiple target layers spatially into a single canvas and learns their compositional relationships through spatial attention, producing semantically coherent and hierarchically structured layers. For image-to-PSD decomposition, it performs iterative in-context editing, progressively extracting and erasing textual and foreground components to reconstruct editable PSD layers from a single flattened image. An RGBA-VAE is employed as an auxiliary representation module to preserve transparency without affecting structure learning. Extensive experiments on our new RGBA-layered dataset demonstrate that OmniPSD achieves high-fidelity generation, structural consistency, and transparency awareness, offering a new paradigm for layered design generation and decomposition with diffusion transformers.

OmniPSD: Layered PSD Generation with Diffusion Transformer

TL;DR

OmniPSD presents a unified diffusion-transformer framework for layered PSD generation and decomposition with explicit alpha-channel handling. It introduces a shared RGBA-VAE latent space and two task-specific branches (text-to-PSD and image-to-PSD) within the Flux ecosystem, enabling in-context learning and iterative editing. A large Layered Poster Dataset supports training and evaluation, and extensive experiments show high fidelity, structural coherence, and accurate transparency in editable PSD outputs. The work establishes a new paradigm for design-aware, layered graphic generation and reconstruction using diffusion transformers.

Abstract

Recent advances in diffusion models have greatly improved image generation and editing, yet generating or reconstructing layered PSD files with transparent alpha channels remains highly challenging. We propose OmniPSD, a unified diffusion framework built upon the Flux ecosystem that enables both text-to-PSD generation and image-to-PSD decomposition through in-context learning. For text-to-PSD generation, OmniPSD arranges multiple target layers spatially into a single canvas and learns their compositional relationships through spatial attention, producing semantically coherent and hierarchically structured layers. For image-to-PSD decomposition, it performs iterative in-context editing, progressively extracting and erasing textual and foreground components to reconstruct editable PSD layers from a single flattened image. An RGBA-VAE is employed as an auxiliary representation module to preserve transparency without affecting structure learning. Extensive experiments on our new RGBA-layered dataset demonstrate that OmniPSD achieves high-fidelity generation, structural consistency, and transparency awareness, offering a new paradigm for layered design generation and decomposition with diffusion transformers.

Paper Structure

This paper contains 20 sections, 8 equations, 9 figures, 7 tables.

Figures (9)

  • Figure 1: OmniPSD is a Diffusion-Transformer framework that generates layered PSD files with transparent alpha channels. Our system supports both Text-to-PSD multi-layer synthesis and Image-to-PSD reconstruction, producing editable layers that preserve structure, transparency, and semantic consistency.
  • Figure 2: OmniPSD overview. A unified Diffusion-Transformer with a shared RGBA-VAE enables both text-to-PSD layered generation (left) and image-to-PSD decomposition (right). Text-to-PSD leverages spatial in-context learning with hierarchical captions, while Image-to-PSD performs iterative flow-guided foreground extraction and background restoration. Our method produces fully editable PSD layers with transparent alpha channels.
  • Figure 3: OmniPSD’s layered dataset. Image-to-PSD is trained on paired samples, while Text-to-PSD uses a $2\times2$ grid that presents the full poster and its decomposed layers for in-context learning.
  • Figure 4: Generation results of OmniPSD. (a) Image-to-PSD reconstruction decomposes an input poster into editable text layers, multiple foreground layers, and a clean background layer. (b) Text-to-PSD synthesis uses hierarchical captions to generate background and foreground layers, followed by rendering the corresponding editable text layers.
  • Figure 5: Compare with baselines on text-to-PSD and image-to-PSD. OmniPSD matches the visual quality of leading diffusion and vision-language models while uniquely supporting multi-layer PSD generation with transparent alpha channels. Compared to existing layered synthesis baselines, it achieves clearly superior visual fidelity and more coherent, logically structured layers.
  • ...and 4 more figures