Table of Contents
Fetching ...

Create Anything Anywhere: Layout-Controllable Personalized Diffusion Model for Multiple Subjects

Wei Li, Hebei Li, Yansong Peng, Siying Wu, Yueyi Zhang, Xiaoyan Sun

TL;DR

The paper tackles the lack of precise layout controllability in personalized diffusion-based generation across multiple subjects. It introduces LCP-Diffusion, a tuning-free framework that combines a Dynamic-Static Complementary Visual Refining module with a Dual Layout Control mechanism to preserve subject identity while enforcing spatial constraints, using an adapter to inject detailed and layout cues. Empirical results on DreamBench and MultiBench, along with targeted ablations, demonstrate notable gains in identity fidelity and layout accuracy, validating the method's effectiveness and robustness. The work enables true "create anything anywhere" capabilities and offers a path to extend to other customizable diffusion models.

Abstract

Diffusion models have significantly advanced text-to-image generation, laying the foundation for the development of personalized generative frameworks. However, existing methods lack precise layout controllability and overlook the potential of dynamic features of reference subjects in improving fidelity. In this work, we propose Layout-Controllable Personalized Diffusion (LCP-Diffusion) model, a novel framework that integrates subject identity preservation with flexible layout guidance in a tuning-free approach. Our model employs a Dynamic-Static Complementary Visual Refining module to comprehensively capture the intricate details of reference subjects, and introduces a Dual Layout Control mechanism to enforce robust spatial control across both training and inference stages. Extensive experiments validate that LCP-Diffusion excels in both identity preservation and layout controllability. To the best of our knowledge, this is a pioneering work enabling users to "create anything anywhere".

Create Anything Anywhere: Layout-Controllable Personalized Diffusion Model for Multiple Subjects

TL;DR

The paper tackles the lack of precise layout controllability in personalized diffusion-based generation across multiple subjects. It introduces LCP-Diffusion, a tuning-free framework that combines a Dynamic-Static Complementary Visual Refining module with a Dual Layout Control mechanism to preserve subject identity while enforcing spatial constraints, using an adapter to inject detailed and layout cues. Empirical results on DreamBench and MultiBench, along with targeted ablations, demonstrate notable gains in identity fidelity and layout accuracy, validating the method's effectiveness and robustness. The work enables true "create anything anywhere" capabilities and offers a path to extend to other customizable diffusion models.

Abstract

Diffusion models have significantly advanced text-to-image generation, laying the foundation for the development of personalized generative frameworks. However, existing methods lack precise layout controllability and overlook the potential of dynamic features of reference subjects in improving fidelity. In this work, we propose Layout-Controllable Personalized Diffusion (LCP-Diffusion) model, a novel framework that integrates subject identity preservation with flexible layout guidance in a tuning-free approach. Our model employs a Dynamic-Static Complementary Visual Refining module to comprehensively capture the intricate details of reference subjects, and introduces a Dual Layout Control mechanism to enforce robust spatial control across both training and inference stages. Extensive experiments validate that LCP-Diffusion excels in both identity preservation and layout controllability. To the best of our knowledge, this is a pioneering work enabling users to "create anything anywhere".

Paper Structure

This paper contains 16 sections, 11 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: Illustration of LCP-Diffusion, a novel personalized layout-controllable framework. It supports single-subject (top-left), multi-subject (top-right), and multi-layout (bottom) scenarios, accommodating flexible input combinations of multiple reference images, text prompts, and layouts. LCP-Diffusion preserves subject identity, aligns with textual descriptions, and precisely adheres to layout constraints simultaneously, which allows “creating anything anywhere”.
  • Figure 2: Left: Overview of the proposed framework; Right: The structures of Static Detail Refiner and Attention Block in UNet.
  • Figure 3: Qualitative results show that LCP-Diffusion performs best both in accurate layout control and faithful text prompt alignment.
  • Figure 4: Qualitative results show the fine-grained detail preservation ability beyond layout controllability.
  • Figure 5: An illustration of the superior performance of the designed static detail refiner.