Table of Contents
Fetching ...

RepControlNet: ControlNet Reparameterization

Zhaoli Deng, Kaibin Zhou, Fanyi Wang, Zhenpeng Mi

TL;DR

RepControlNet tackles the high inference cost of controllable diffusion by introducing modal reparameterization, allowing conditional generation without added computation. It freezes a pretrained diffusion backbone, trains a parallel modal copy of convolutional and linear layers plus an adapter to encode conditioning, and reparameterizes the copy into the base model during inference, keeping the parameter count unchanged. Empirical results on SD1.5 and SDXL across Canny, Depth, semantic maps, and skeleton show competitive fidelity (FID/CLIP) with negligible overhead compared to the base models and without the extra parameters of ControlNet. The approach also demonstrates potential for identity-preserving generation when integrated with suitable identity features, indicating practical applicability for efficient controllable diffusion tasks.

Abstract

With the wide application of diffusion model, the high cost of inference resources has became an important bottleneck for its universal application. Controllable generation, such as ControlNet, is one of the key research directions of diffusion model, and the research related to inference acceleration and model compression is more important. In order to solve this problem, this paper proposes a modal reparameterization method, RepControlNet, to realize the controllable generation of diffusion models without increasing computation. In the training process, RepControlNet uses the adapter to modulate the modal information into the feature space, copy the CNN and MLP learnable layers of the original diffusion model as the modal network, and initialize these weights based on the original weights and coefficients. The training process only optimizes the parameters of the modal network. In the inference process, the weights of the neutralization original diffusion model in the modal network are reparameterized, which can be compared with or even surpass the methods such as ControlNet, which use additional parameters and computational quantities, without increasing the number of parameters. We have carried out a large number of experiments on both SD1.5 and SDXL, and the experimental results show the effectiveness and efficiency of the proposed RepControlNet.

RepControlNet: ControlNet Reparameterization

TL;DR

RepControlNet tackles the high inference cost of controllable diffusion by introducing modal reparameterization, allowing conditional generation without added computation. It freezes a pretrained diffusion backbone, trains a parallel modal copy of convolutional and linear layers plus an adapter to encode conditioning, and reparameterizes the copy into the base model during inference, keeping the parameter count unchanged. Empirical results on SD1.5 and SDXL across Canny, Depth, semantic maps, and skeleton show competitive fidelity (FID/CLIP) with negligible overhead compared to the base models and without the extra parameters of ControlNet. The approach also demonstrates potential for identity-preserving generation when integrated with suitable identity features, indicating practical applicability for efficient controllable diffusion tasks.

Abstract

With the wide application of diffusion model, the high cost of inference resources has became an important bottleneck for its universal application. Controllable generation, such as ControlNet, is one of the key research directions of diffusion model, and the research related to inference acceleration and model compression is more important. In order to solve this problem, this paper proposes a modal reparameterization method, RepControlNet, to realize the controllable generation of diffusion models without increasing computation. In the training process, RepControlNet uses the adapter to modulate the modal information into the feature space, copy the CNN and MLP learnable layers of the original diffusion model as the modal network, and initialize these weights based on the original weights and coefficients. The training process only optimizes the parameters of the modal network. In the inference process, the weights of the neutralization original diffusion model in the modal network are reparameterized, which can be compared with or even surpass the methods such as ControlNet, which use additional parameters and computational quantities, without increasing the number of parameters. We have carried out a large number of experiments on both SD1.5 and SDXL, and the experimental results show the effectiveness and efficiency of the proposed RepControlNet.
Paper Structure (19 sections, 4 equations, 7 figures, 3 tables)

This paper contains 19 sections, 4 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Flowchart of training and reparameter process of RepControNet.
  • Figure 2: Image generation results with Canny Edge conditioning. Captions: Christmas greeting with wreath and red bauble on white background, scandinavian christmas text. A beautiful unicorn with a rainbow on a background of clouds unicorn cartoon character for pringting posters. The marble bathroom feature white and black marble walls and floors.
  • Figure 3: Image generation results with Depth Map conditioning. Captions: A parking lot with a stop sign and a building. A car parked by a red brick wall. A large red brick building.
  • Figure 4: Image generation results with COCO segmentation conditioning. Captions: An upside down stop sign by the road. A dog standing on a bench with an orange tree in back. A dog is lying next to a laptop computer.
  • Figure 5: Image generation results with ADE segmentation conditioning. Captions: A large stone house. Two beds in a bedroom. A bathroom with two sinks and a tub. People looking at an aquarium with fish in it.
  • ...and 2 more figures