Consistent Image Layout Editing with Diffusion Models

Tao Xia; Yudi Zhang; Ting Liu Lei Zhang

Consistent Image Layout Editing with Diffusion Models

Tao Xia, Yudi Zhang, Ting Liu Lei Zhang

TL;DR

This work tackles the challenge of editing real-image layouts using diffusion models by introducing a two-stage framework that first learns multiple object concepts from a single image (Multi-Concept Learning) and then enforces layout guidance with an appearance-projection mechanism grounded in diffusion-feature semantic consistency. It combines a region-based cross-attention loss for layout control with an Unconditional Appearance Projection and Region Prior Appearance Projection to transfer and refine object appearance in the edited regions, aided by a layout-friendly initialization noise. An asynchronous editing strategy further mitigates concept entanglement while maintaining fidelity. Extensive experiments on Layout-Bench show superior layout alignment and image quality compared with prior methods, demonstrating the practical viability of semantically consistent, diffusion-based layout editing for real images.

Abstract

Despite the great success of large-scale text-to-image diffusion models in image generation and image editing, existing methods still struggle to edit the layout of real images. Although a few works have been proposed to tackle this problem, they either fail to adjust the layout of images, or have difficulty in preserving visual appearance of objects after the layout adjustment. To bridge this gap, this paper proposes a novel image layout editing method that can not only re-arrange a real image to a specified layout, but also can ensure the visual appearance of the objects consistent with their appearance before editing. Concretely, the proposed method consists of two key components. Firstly, a multi-concept learning scheme is used to learn the concepts of different objects from a single image, which is crucial for keeping visual consistency in the layout editing. Secondly, it leverages the semantic consistency within intermediate features of diffusion models to project the appearance information of objects to the desired regions directly. Besides, a novel initialization noise design is adopted to facilitate the process of re-arranging the layout. Extensive experiments demonstrate that the proposed method outperforms previous works in both layout alignment and visual consistency for the task of image layout editing

Consistent Image Layout Editing with Diffusion Models

TL;DR

Abstract

Consistent Image Layout Editing with Diffusion Models

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (15)