ToLo: A Two-Stage, Training-Free Layout-To-Image Generation Framework For High-Overlap Layouts
Linhao Huang, Jing Yu
TL;DR
This paper tackles the challenge of accurate layout-to-image generation when input layouts exhibit significant overlap. It introduces ToLo, a two-stage, training-free framework that first aggregates attention maps within their target regions and then separates them to reduce cross-concept leakage, guided by $L_{ m agg}$ and $L_{ m sep}$. By applying ToLo to the RnB baseline and evaluating on an IoU-partitioned version of HRS-Bench, the authors demonstrate robust improvements for high-overlap layouts, while noting some trade-offs in object size control that can be mitigated by an IoU-based mode switch. The work contributes a practical, inference-time method that enhances spatial fidelity in diffusion-based LIS and provides a new dataset partitioning strategy to benchmark overlap handling.
Abstract
Recent training-free layout-to-image diffusion models have demonstrated remarkable performance in generating high-quality images with controllable layouts. These models follow a one-stage framework: Encouraging the model to focus the attention map of each concept on its corresponding region by defining attention map-based losses. However, these models still struggle to accurately follow layouts with significant overlap, often leading to issues like attribute leakage and missing entities. In this paper, we propose ToLo, a two-stage, training-free layout-to-image generation framework for high-overlap layouts. Our framework consists of two stages: the aggregation stage and the separation stage, each with its own loss function based on the attention map. To provide a more effective evaluation, we partition the HRS dataset based on the Intersection over Union (IoU) of the input layouts, creating a new dataset for layout-to-image generation with varying levels of overlap. Through extensive experiments on this dataset, we demonstrate that ToLo significantly enhances the performance of existing methods when dealing with high-overlap layouts. Our code and dataset are available here: https://github.com/misaka12435/ToLo.
