Meta ControlNet: Enhancing Task Adaptation via Meta Learning

Junjie Yang; Jinze Zhao; Peihao Wang; Zhangyang Wang; Yingbin Liang

Meta ControlNet: Enhancing Task Adaptation via Meta Learning

Junjie Yang, Jinze Zhao, Peihao Wang, Zhangyang Wang, Yingbin Liang

TL;DR

This work addresses the slow adaptation and limited zero-shot capabilities of ControlNet by introducing Meta ControlNet, a meta-learning framework that learns a robust initialization for ControlNet through FO-MAML. A novel layer-freezing strategy freezes Encoder Block4 and the Middle Block during meta-training to separate task-specific and shared representations, enabling rapid learning and zero-shot edge control. Meta ControlNet reduces the required training steps from around 5000 to about 1000 and achieves zero-shot capability for edge-based tasks, with fast, few-shot adaptation for non-edge tasks like Human Pose (≈100 steps). Across extensive experiments using DreamSim metrics, Meta ControlNet outperforms Prompt Diffusion in generalization and zero-shot generalization, illustrating a significant improvement in practical controllable diffusion. The approach offers a scalable, data-efficient path to broad, task-aware image synthesis with diffusion models.

Abstract

Diffusion-based image synthesis has attracted extensive attention recently. In particular, ControlNet that uses image-based prompts exhibits powerful capability in image tasks such as canny edge detection and generates images well aligned with these prompts. However, vanilla ControlNet generally requires extensive training of around 5000 steps to achieve a desirable control for a single task. Recent context-learning approaches have improved its adaptability, but mainly for edge-based tasks, and rely on paired examples. Thus, two important open issues are yet to be addressed to reach the full potential of ControlNet: (i) zero-shot control for certain tasks and (ii) faster adaptation for non-edge-based tasks. In this paper, we introduce a novel Meta ControlNet method, which adopts the task-agnostic meta learning technique and features a new layer freezing design. Meta ControlNet significantly reduces learning steps to attain control ability from 5000 to 1000. Further, Meta ControlNet exhibits direct zero-shot adaptability in edge-based tasks without any finetuning, and achieves control within only 100 finetuning steps in more complex non-edge tasks such as Human Pose, outperforming all existing methods. The codes is available in https://github.com/JunjieYang97/Meta-ControlNet.

Meta ControlNet: Enhancing Task Adaptation via Meta Learning

TL;DR

Abstract

Paper Structure (19 sections, 3 equations, 11 figures, 6 tables)

This paper contains 19 sections, 3 equations, 11 figures, 6 tables.

Introduction
Meta ControlNet
Algorithm Design
Task Selection
Experimental Results
Fast Control Acquiring in Training
Zero-Shot Capability for Edge-Based Tasks
Fast Adaptation for Non-Edge Tasks
Quantitative results
Ablation Study
Layer Freezing
Decoder Connection
Conclusion
Appendix
Related Work
...and 4 more sections

Figures (11)

Figure 1: Trained from stable diffusion initial $\theta_{SD}$, the meta learned initial $\theta_{meta}$ is used for various task adaptation.
Figure 2: Meta ControlNet training pipeline. ControlNet parameter is meta updated via meta tasks (HED, Segmentation, Depth). Stable Diffusion parameters are fixed and ControlNet middle layers (Encoder Block 4 and Middle Block) are frozen during the training phase.
Figure 3: Validation set samples from each training task (HED, Depth, Segmentation) after 1000 steps of training updates.
Figure 4: Samples from edge-based tasks (Canny, Normal) in zero-shot adaptation.
Figure 5: Sample comparison between proposed Meta ControlNet and Prompt Diffusion (PD) baseline for canny task in few-shot finetuning. PD requires example pairs to update and thus is only available in odd number few-shot setting.
...and 6 more figures

Meta ControlNet: Enhancing Task Adaptation via Meta Learning

TL;DR

Abstract

Meta ControlNet: Enhancing Task Adaptation via Meta Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (11)