Improving Fine-Grained Control via Aggregation of Multiple Diffusion Models

Conghan Yue; Zhengwei Peng; Shiyan Du; Zhi Ji; Chuangjian Cai; Le Wan; Dongyu Zhang

Improving Fine-Grained Control via Aggregation of Multiple Diffusion Models

Conghan Yue, Zhengwei Peng, Shiyan Du, Zhi Ji, Chuangjian Cai, Le Wan, Dongyu Zhang

TL;DR

AMDM introduces a training-free framework to achieve fine-grained conditional generation by aggregating latent features from multiple diffusion models within the same ecosystem. It combines two components—spherical aggregation and deviation optimization—to merge information from several models into a base model during the early diffusion steps, then samples from the base model for final output. The approach is supported by theoretical insights about latent manifolds and proximity of conditioning, and empirically validated through aggregation experiments with InteractDiffusion, MIGC, and IP-Adapter, showing improved attribute, interaction, and style control with minimal training, plus ablations illustrating the importance of early-stage aggregation. By operating directly in latent space and avoiding heavy training, AMDM offers a practical, scalable method for fine-grained conditional diffusion generation with broad applicability across SD-based ecosystems.

Abstract

While many diffusion models perform well when controlling particular aspects such as style, character, and interaction, they struggle with fine-grained control due to dataset limitations and intricate model architecture design. This paper introduces a novel training-free algorithm for fine-grained generation, called Aggregation of Multiple Diffusion Models (AMDM). The algorithm integrates features in the latent data space from multiple diffusion models within the same ecosystem into a specified model, thereby activating particular features and enabling fine-grained control. Experimental results demonstrate that AMDM significantly improves fine-grained control without training, validating its effectiveness. Additionally, it reveals that diffusion models initially focus on features such as position, attributes, and style, with later stages improving generation quality and consistency. AMDM offers a new perspective for tackling the challenges of fine-grained conditional generation in diffusion models. Specifically, it allows us to fully utilize existing or develop new conditional diffusion models that control specific aspects, and then aggregate them using the AMDM algorithm. This eliminates the need for constructing complex datasets, designing intricate model architectures, and incurring high training costs. Code is available at: https://github.com/Hammour-steak/AMDM.

Improving Fine-Grained Control via Aggregation of Multiple Diffusion Models

TL;DR

Abstract

Paper Structure (37 sections, 5 theorems, 80 equations, 15 figures, 10 tables, 1 algorithm)

This paper contains 37 sections, 5 theorems, 80 equations, 15 figures, 10 tables, 1 algorithm.

Introduction
Preliminaries: Stable Diffusion
Method
Analysis
Do aggregation operations exist in different diffusion models?
Which models can achieve fine-grained generation through aggregation operations?
Algorithm: AMDM
Spherical Aggregation.
Deviation Optimization.
Experiments
Aggregation Experiments
InteractDiffusion and MIGC.
InteractDiffusion and IP-Adapter.
InteractDiffusion, MIGC and IP-Adapter.
Ablation Studies
...and 22 more sections

Key Result

Proposition 3.1

Let $\mathbf z'_t$ denote the aggregated variable at time step $t$. For the sampling step from $t$ to $t-1$, two diffusion models $p_{\theta_1}$ and $p_{\theta_2}$ sample $\mathbf{z}^{\theta_1}_{t-1}$ and $\mathbf{z}^{\theta_2}_{t-1}$ respectively from (cfg). Then, with probability at least $1-\gamm where $\varphi$ is the angle between $\mathbf z^{\theta_1}_{t-1}$ and $\mathbf z^{\theta_2}_{t-1}$,

Figures (15)

Figure 1: Examples of fine-grained conditional control of the same caption by different models.
Figure 2: Geometry of AMDM (Left) and Deviation Optimization (Right). The algorithm employs spherical aggregation and deviation optimization to incorporate conditional information during the initial steps. Subsequently, direct sampling is applied to expedite the process and generate high-quality images.
Figure 3: Visual results of aggregating MIGC into InteractDiffusion.
Figure 4: Visual results of aggregating IP-Adapter into InteractDiffusion.
Figure 5: Visual results of aggregating MIGC and IP-Adapter into InteractDiffusion.
...and 10 more figures

Theorems & Definitions (5)

Proposition 3.1
Proposition 3.2
Corollary 3.3
Lemma 1.1
Proposition 4.1

Improving Fine-Grained Control via Aggregation of Multiple Diffusion Models

TL;DR

Abstract

Improving Fine-Grained Control via Aggregation of Multiple Diffusion Models

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (15)

Theorems & Definitions (5)