MIGC++: Advanced Multi-Instance Generation Controller for Image Synthesis
Dewei Zhou, You Li, Fan Ma, Zongxin Yang, Yi Yang
TL;DR
The paper defines the Multi-Instance Generation (MIG) task, addressing the challenges of attribute leakage, limited instance-description modalities, and iterative consistency in generating multiple, precisely placed objects within a single image. It introduces MIGC, a divide-and-conquer controller that renders single-instance shading and merges results to prevent leakage, and MIGC++, which adds multimodal attribute control (text and image) and fine-grained position control (boxes and masks) via a Multimodal Enhance Attention and a Refined Shader. To enhance iterative MIG, the Consistent-MIG algorithm preserves unmodified regions and maintains instance identity across edits. The authors validate on COCO-MIG and Multimodal-MIG benchmarks, showing substantial gains in ISR, MIoU, AP, and text-image alignment compared with state-of-the-art baselines, and demonstrate robustness across varying instance counts and modalities.
Abstract
We introduce the Multi-Instance Generation (MIG) task, which focuses on generating multiple instances within a single image, each accurately placed at predefined positions with attributes such as category, color, and shape, strictly following user specifications. MIG faces three main challenges: avoiding attribute leakage between instances, supporting diverse instance descriptions, and maintaining consistency in iterative generation. To address attribute leakage, we propose the Multi-Instance Generation Controller (MIGC). MIGC generates multiple instances through a divide-and-conquer strategy, breaking down multi-instance shading into single-instance tasks with singular attributes, later integrated. To provide more types of instance descriptions, we developed MIGC++. MIGC++ allows attribute control through text \& images and position control through boxes \& masks. Lastly, we introduced the Consistent-MIG algorithm to enhance the iterative MIG ability of MIGC and MIGC++. This algorithm ensures consistency in unmodified regions during the addition, deletion, or modification of instances, and preserves the identity of instances when their attributes are changed. We introduce the COCO-MIG and Multimodal-MIG benchmarks to evaluate these methods. Extensive experiments on these benchmarks, along with the COCO-Position benchmark and DrawBench, demonstrate that our methods substantially outperform existing techniques, maintaining precise control over aspects including position, attribute, and quantity. Project page: https://github.com/limuloo/MIGC.
