Table of Contents
Fetching ...

Step-by-step Layered Design Generation

Faizan Farooq Khan, K J Joseph, Koustava Goswami, Mohamed Elhoseiny, Balaji Vasan Srinivasan

TL;DR

This paper introduces Step-by-step Layered Design Generation, a principled framework (SLEDGE) that marries multi-modal LLMs with diffusion models to produce iterative, editable design updates. It formalizes the problem, proposes a three-stage pipeline to fuse visual and textual signals, and integrates a new dataset (IDeation) and benchmark to enable robust evaluation across themes and instructions. Extensive experiments show SLEDGE outperforms strong baselines on fidelity, aesthetics, and edit adherence, with ablations highlighting the value of layer-wise editing components and targeted loss functions. The work advances human-AI co-creation in graphic design and opens doors to scalable, transparent, stepwise design generation.

Abstract

Design generation, in its essence, is a step-by-step process where designers progressively refine and enhance their work through careful modifications. Despite this fundamental characteristic, existing approaches mainly treat design synthesis as a single-step generation problem, significantly underestimating the inherent complexity of the creative process. To bridge this gap, we propose a novel problem setting called Step-by-Step Layered Design Generation, which tasks a machine learning model with generating a design that adheres to a sequence of instructions from a designer. Leveraging recent advancements in multi-modal LLMs, we propose SLEDGE: Step-by-step LayEred Design GEnerator to model each update to a design as an atomic, layered change over its previous state, while being grounded in the instruction. To complement our new problem setting, we introduce a new evaluation suite, including a dataset and a benchmark. Our exhaustive experimental analysis and comparison with state-of-the-art approaches tailored to our new setup demonstrate the efficacy of our approach. We hope our work will attract attention to this pragmatic and under-explored research area.

Step-by-step Layered Design Generation

TL;DR

This paper introduces Step-by-step Layered Design Generation, a principled framework (SLEDGE) that marries multi-modal LLMs with diffusion models to produce iterative, editable design updates. It formalizes the problem, proposes a three-stage pipeline to fuse visual and textual signals, and integrates a new dataset (IDeation) and benchmark to enable robust evaluation across themes and instructions. Extensive experiments show SLEDGE outperforms strong baselines on fidelity, aesthetics, and edit adherence, with ablations highlighting the value of layer-wise editing components and targeted loss functions. The work advances human-AI co-creation in graphic design and opens doors to scalable, transparent, stepwise design generation.

Abstract

Design generation, in its essence, is a step-by-step process where designers progressively refine and enhance their work through careful modifications. Despite this fundamental characteristic, existing approaches mainly treat design synthesis as a single-step generation problem, significantly underestimating the inherent complexity of the creative process. To bridge this gap, we propose a novel problem setting called Step-by-Step Layered Design Generation, which tasks a machine learning model with generating a design that adheres to a sequence of instructions from a designer. Leveraging recent advancements in multi-modal LLMs, we propose SLEDGE: Step-by-step LayEred Design GEnerator to model each update to a design as an atomic, layered change over its previous state, while being grounded in the instruction. To complement our new problem setting, we introduce a new evaluation suite, including a dataset and a benchmark. Our exhaustive experimental analysis and comparison with state-of-the-art approaches tailored to our new setup demonstrate the efficacy of our approach. We hope our work will attract attention to this pragmatic and under-explored research area.

Paper Structure

This paper contains 24 sections, 3 equations, 16 figures, 2 tables.

Figures (16)

  • Figure 2: The figure provides an overview of SLEDGE: Step-by-step Layered Design Generator. The current state of the canvas $\mathbf{C}_{t}$, the instruction from the user $\mathbf{I}_{t}$, and an optional image to be inserted $\mathbf{U}_{t}$ is provided to the framework. A MLLM unifies these signals to generate the next state of the $\mathbf{C}_{t+1}$, along with the associated metadata, enabling layer-by-layer generation.
  • Figure 3: Aligning encoder and decoder.
  • Figure 4: Aligning MLLM and decoder.
  • Figure 5: The figure illustrates our data generation pipeline, where each design element, along with its final composition, is processed using GPT-4o gpt4o. This process generates a structured mapping of layer order and detailed textual instructions that describe the transformation applied from one layer to the subsequent one.
  • Figure 6: Performance comparison averaged across both datasets on three key aspects: theme adherence, aesthetic quality, and edit compliance. Each baseline is compared with SLEDGE, using GPT-4o and InternLM-XComposer as the evaluators.
  • ...and 11 more figures