Ada-adapter:Fast Few-shot Style Personlization of Diffusion Model with Pre-trained Image Encoder
Jia Liu, Changlin Li, Qirui Sun, Jiahui Ming, Chen Fang, Jue Wang, Bing Zeng, Shuaicheng Liu
TL;DR
The paper tackles the high data and compute costs of diffusion-model style transfer by introducing Ada-Adapter, a framework that fuses a pre-trained image encoder with off-the-shelf diffusion models to enable zero-shot and few-shot style personalization. It leverages a hierarchical, layer-wise conditioning strategy to disentangle style from content and to balance image priors with text prompts, using multi-modal fine-tuning with LoRA. Empirical results on 16 style datasets show Ada-Adapter delivers superior stylization quality and text alignment while requiring only 3–5 reference images and minutes of training, outperforming existing zero-shot and few-shot baselines. The approach significantly lowers practical barriers to diffusion-based style personalization, enabling rapid, stable, and scalable customization for creators and practitioners.
Abstract
Fine-tuning advanced diffusion models for high-quality image stylization usually requires large training datasets and substantial computational resources, hindering their practical applicability. We propose Ada-Adapter, a novel framework for few-shot style personalization of diffusion models. Ada-Adapter leverages off-the-shelf diffusion models and pre-trained image feature encoders to learn a compact style representation from a limited set of source images. Our method enables efficient zero-shot style transfer utilizing a single reference image. Furthermore, with a small number of source images (three to five are sufficient) and a few minutes of fine-tuning, our method can capture intricate style details and conceptual characteristics, generating high-fidelity stylized images that align well with the provided text prompts. We demonstrate the effectiveness of our approach on various artistic styles, including flat art, 3D rendering, and logo design. Our experimental results show that Ada-Adapter outperforms existing zero-shot and few-shot stylization methods in terms of output quality, diversity, and training efficiency.
