Universal Few-Shot Spatial Control for Diffusion Models
Kiet T. Nguyen, Chanhyuk Lee, Donggyun Kim, Dong Hoon Lee, Seunghoon Hong
TL;DR
This work introduces Universal Few-shot Control (UFC), a unified, data-efficient framework for steering frozen diffusion models with unseen spatial conditions. UFC leverages patch-wise matching over a small support set to interpolate task-specific control features, combined with episodic meta-training and parameter-efficient fine-tuning to generalize across diverse spatial modalities. In extensive experiments across six spatial tasks and two backbones (UNet and DiT), UFC delivers strong few-shot controllability with as little as 30 annotated examples and remains competitive with fully supervised baselines at 0.1% of training data, while maintaining solid image quality. The approach offers practical, versatile spatial control for diffusion models, highlighting its potential for flexible content generation under limited labeled data and across architectures. Limitations include focus on spatial control rather than appearance-preserving tasks and the need for some fine-tuning for new controls, suggesting avenues for future research in in-context-like adaptation and broader task applicability.
Abstract
Spatial conditioning in pretrained text-to-image diffusion models has significantly improved fine-grained control over the structure of generated images. However, existing control adapters exhibit limited adaptability and incur high training costs when encountering novel spatial control conditions that differ substantially from the training tasks. To address this limitation, we propose Universal Few-Shot Control (UFC), a versatile few-shot control adapter capable of generalizing to novel spatial conditions. Given a few image-condition pairs of an unseen task and a query condition, UFC leverages the analogy between query and support conditions to construct task-specific control features, instantiated by a matching mechanism and an update on a small set of task-specific parameters. Experiments on six novel spatial control tasks show that UFC, fine-tuned with only 30 annotated examples of novel tasks, achieves fine-grained control consistent with the spatial conditions. Notably, when fine-tuned with 0.1% of the full training data, UFC achieves competitive performance with the fully supervised baselines in various control tasks. We also show that UFC is applicable agnostically to various diffusion backbones and demonstrate its effectiveness on both UNet and DiT architectures. Code is available at https://github.com/kietngt00/UFC.
