Improving Fine-Grained Control via Aggregation of Multiple Diffusion Models
Conghan Yue, Zhengwei Peng, Shiyan Du, Zhi Ji, Chuangjian Cai, Le Wan, Dongyu Zhang
TL;DR
AMDM introduces a training-free framework to achieve fine-grained conditional generation by aggregating latent features from multiple diffusion models within the same ecosystem. It combines two components—spherical aggregation and deviation optimization—to merge information from several models into a base model during the early diffusion steps, then samples from the base model for final output. The approach is supported by theoretical insights about latent manifolds and proximity of conditioning, and empirically validated through aggregation experiments with InteractDiffusion, MIGC, and IP-Adapter, showing improved attribute, interaction, and style control with minimal training, plus ablations illustrating the importance of early-stage aggregation. By operating directly in latent space and avoiding heavy training, AMDM offers a practical, scalable method for fine-grained conditional diffusion generation with broad applicability across SD-based ecosystems.
Abstract
While many diffusion models perform well when controlling particular aspects such as style, character, and interaction, they struggle with fine-grained control due to dataset limitations and intricate model architecture design. This paper introduces a novel training-free algorithm for fine-grained generation, called Aggregation of Multiple Diffusion Models (AMDM). The algorithm integrates features in the latent data space from multiple diffusion models within the same ecosystem into a specified model, thereby activating particular features and enabling fine-grained control. Experimental results demonstrate that AMDM significantly improves fine-grained control without training, validating its effectiveness. Additionally, it reveals that diffusion models initially focus on features such as position, attributes, and style, with later stages improving generation quality and consistency. AMDM offers a new perspective for tackling the challenges of fine-grained conditional generation in diffusion models. Specifically, it allows us to fully utilize existing or develop new conditional diffusion models that control specific aspects, and then aggregate them using the AMDM algorithm. This eliminates the need for constructing complex datasets, designing intricate model architectures, and incurring high training costs. Code is available at: https://github.com/Hammour-steak/AMDM.
