Step-by-step Layered Design Generation
Faizan Farooq Khan, K J Joseph, Koustava Goswami, Mohamed Elhoseiny, Balaji Vasan Srinivasan
TL;DR
This paper introduces Step-by-step Layered Design Generation, a principled framework (SLEDGE) that marries multi-modal LLMs with diffusion models to produce iterative, editable design updates. It formalizes the problem, proposes a three-stage pipeline to fuse visual and textual signals, and integrates a new dataset (IDeation) and benchmark to enable robust evaluation across themes and instructions. Extensive experiments show SLEDGE outperforms strong baselines on fidelity, aesthetics, and edit adherence, with ablations highlighting the value of layer-wise editing components and targeted loss functions. The work advances human-AI co-creation in graphic design and opens doors to scalable, transparent, stepwise design generation.
Abstract
Design generation, in its essence, is a step-by-step process where designers progressively refine and enhance their work through careful modifications. Despite this fundamental characteristic, existing approaches mainly treat design synthesis as a single-step generation problem, significantly underestimating the inherent complexity of the creative process. To bridge this gap, we propose a novel problem setting called Step-by-Step Layered Design Generation, which tasks a machine learning model with generating a design that adheres to a sequence of instructions from a designer. Leveraging recent advancements in multi-modal LLMs, we propose SLEDGE: Step-by-step LayEred Design GEnerator to model each update to a design as an atomic, layered change over its previous state, while being grounded in the instruction. To complement our new problem setting, we introduce a new evaluation suite, including a dataset and a benchmark. Our exhaustive experimental analysis and comparison with state-of-the-art approaches tailored to our new setup demonstrate the efficacy of our approach. We hope our work will attract attention to this pragmatic and under-explored research area.
