UniCon: Unidirectional Information Flow for Effective Control of Large-Scale Diffusion Models
Fanghua Yu, Jinjin Gu, Jinfan Hu, Zheyuan Li, Chao Dong
TL;DR
This paper tackles the high computational cost of training adapters for large diffusion-based generators by removing gradient backpropagation through the diffusion backbone. It introduces UniCon, a unidirectional information flow where the diffusion network feeds a trainable adapter that directly outputs the final image, eliminating the need to compute diffusion gradients during adapter training. The approach yields substantial memory and speed gains (approximately a one-third reduction in VRAM and roughly 2.3x faster training) and allows larger adapters without extra compute, while achieving precise control and high-quality generation across multiple conditional tasks and backbone architectures. Ablation and comparison studies show UniCon outperforms ControlNet and T2I-Adapter in controllability and fidelity, with connector design (ZeroFT) and full preservation of the base diffusion model further reinforcing performance and stability.
Abstract
We introduce UniCon, a novel architecture designed to enhance control and efficiency in training adapters for large-scale diffusion models. Unlike existing methods that rely on bidirectional interaction between the diffusion model and control adapter, UniCon implements a unidirectional flow from the diffusion network to the adapter, allowing the adapter alone to generate the final output. UniCon reduces computational demands by eliminating the need for the diffusion model to compute and store gradients during adapter training. Our results indicate that UniCon reduces GPU memory usage by one-third and increases training speed by 2.3 times, while maintaining the same adapter parameter size. Additionally, without requiring extra computational resources, UniCon enables the training of adapters with double the parameter volume of existing ControlNets. In a series of image conditional generation tasks, UniCon has demonstrated precise responsiveness to control inputs and exceptional generation capabilities.
