Meta ControlNet: Enhancing Task Adaptation via Meta Learning
Junjie Yang, Jinze Zhao, Peihao Wang, Zhangyang Wang, Yingbin Liang
TL;DR
This work addresses the slow adaptation and limited zero-shot capabilities of ControlNet by introducing Meta ControlNet, a meta-learning framework that learns a robust initialization for ControlNet through FO-MAML. A novel layer-freezing strategy freezes Encoder Block4 and the Middle Block during meta-training to separate task-specific and shared representations, enabling rapid learning and zero-shot edge control. Meta ControlNet reduces the required training steps from around 5000 to about 1000 and achieves zero-shot capability for edge-based tasks, with fast, few-shot adaptation for non-edge tasks like Human Pose (≈100 steps). Across extensive experiments using DreamSim metrics, Meta ControlNet outperforms Prompt Diffusion in generalization and zero-shot generalization, illustrating a significant improvement in practical controllable diffusion. The approach offers a scalable, data-efficient path to broad, task-aware image synthesis with diffusion models.
Abstract
Diffusion-based image synthesis has attracted extensive attention recently. In particular, ControlNet that uses image-based prompts exhibits powerful capability in image tasks such as canny edge detection and generates images well aligned with these prompts. However, vanilla ControlNet generally requires extensive training of around 5000 steps to achieve a desirable control for a single task. Recent context-learning approaches have improved its adaptability, but mainly for edge-based tasks, and rely on paired examples. Thus, two important open issues are yet to be addressed to reach the full potential of ControlNet: (i) zero-shot control for certain tasks and (ii) faster adaptation for non-edge-based tasks. In this paper, we introduce a novel Meta ControlNet method, which adopts the task-agnostic meta learning technique and features a new layer freezing design. Meta ControlNet significantly reduces learning steps to attain control ability from 5000 to 1000. Further, Meta ControlNet exhibits direct zero-shot adaptability in edge-based tasks without any finetuning, and achieves control within only 100 finetuning steps in more complex non-edge tasks such as Human Pose, outperforming all existing methods. The codes is available in https://github.com/JunjieYang97/Meta-ControlNet.
