Table of Contents
Fetching ...

UniCon: Unidirectional Information Flow for Effective Control of Large-Scale Diffusion Models

Fanghua Yu, Jinjin Gu, Jinfan Hu, Zheyuan Li, Chao Dong

TL;DR

This paper tackles the high computational cost of training adapters for large diffusion-based generators by removing gradient backpropagation through the diffusion backbone. It introduces UniCon, a unidirectional information flow where the diffusion network feeds a trainable adapter that directly outputs the final image, eliminating the need to compute diffusion gradients during adapter training. The approach yields substantial memory and speed gains (approximately a one-third reduction in VRAM and roughly 2.3x faster training) and allows larger adapters without extra compute, while achieving precise control and high-quality generation across multiple conditional tasks and backbone architectures. Ablation and comparison studies show UniCon outperforms ControlNet and T2I-Adapter in controllability and fidelity, with connector design (ZeroFT) and full preservation of the base diffusion model further reinforcing performance and stability.

Abstract

We introduce UniCon, a novel architecture designed to enhance control and efficiency in training adapters for large-scale diffusion models. Unlike existing methods that rely on bidirectional interaction between the diffusion model and control adapter, UniCon implements a unidirectional flow from the diffusion network to the adapter, allowing the adapter alone to generate the final output. UniCon reduces computational demands by eliminating the need for the diffusion model to compute and store gradients during adapter training. Our results indicate that UniCon reduces GPU memory usage by one-third and increases training speed by 2.3 times, while maintaining the same adapter parameter size. Additionally, without requiring extra computational resources, UniCon enables the training of adapters with double the parameter volume of existing ControlNets. In a series of image conditional generation tasks, UniCon has demonstrated precise responsiveness to control inputs and exceptional generation capabilities.

UniCon: Unidirectional Information Flow for Effective Control of Large-Scale Diffusion Models

TL;DR

This paper tackles the high computational cost of training adapters for large diffusion-based generators by removing gradient backpropagation through the diffusion backbone. It introduces UniCon, a unidirectional information flow where the diffusion network feeds a trainable adapter that directly outputs the final image, eliminating the need to compute diffusion gradients during adapter training. The approach yields substantial memory and speed gains (approximately a one-third reduction in VRAM and roughly 2.3x faster training) and allows larger adapters without extra compute, while achieving precise control and high-quality generation across multiple conditional tasks and backbone architectures. Ablation and comparison studies show UniCon outperforms ControlNet and T2I-Adapter in controllability and fidelity, with connector design (ZeroFT) and full preservation of the base diffusion model further reinforcing performance and stability.

Abstract

We introduce UniCon, a novel architecture designed to enhance control and efficiency in training adapters for large-scale diffusion models. Unlike existing methods that rely on bidirectional interaction between the diffusion model and control adapter, UniCon implements a unidirectional flow from the diffusion network to the adapter, allowing the adapter alone to generate the final output. UniCon reduces computational demands by eliminating the need for the diffusion model to compute and store gradients during adapter training. Our results indicate that UniCon reduces GPU memory usage by one-third and increases training speed by 2.3 times, while maintaining the same adapter parameter size. Additionally, without requiring extra computational resources, UniCon enables the training of adapters with double the parameter volume of existing ControlNets. In a series of image conditional generation tasks, UniCon has demonstrated precise responsiveness to control inputs and exceptional generation capabilities.

Paper Structure

This paper contains 41 sections, 18 figures, 6 tables, 3 algorithms.

Figures (18)

  • Figure 1: This figure illustrates the schematic comparison between our proposed UniCon and ControlNet. In UniCon, information flows unidirectionally from the diffusion model to the adapter network, which directly outputs the results. This design is highly computationally efficient as it does not require computing and storing gradients for the diffusion model. (c) displays results generated from downsampled images, and (d) shows outcomes based on depth maps. UniCon achieves improved performance while utilizing fewer resources.
  • Figure 2: The UniCon design for both DiT and SD U-Net. We omitted some blocks in the SD U-Net due to the space limit.
  • Figure 3: Schematic representation of the five different variants we covered in our ablation studies.
  • Figure 4: The comparison between decoder-part-focused UniCon and replace diffusion decoder part. The results indicate that if the complete pre-trained diffusion model is not preserved, there is a significant decline in generative capabilities. These two models are shown in \ref{['fig:ablation']} (d) and (e).
  • Figure 5: Comparison of different methods. We present the qualitative comparisons ControlNet controlnet and T2I t2i with both Stable Diffusion (SD) sd and Diffusion Transformer (DiT) DiT.
  • ...and 13 more figures