Table of Contents
Fetching ...

ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models

Jiaxiang Cheng, Pan Xie, Xin Xia, Jiashi Li, Jie Wu, Yuxi Ren, Huixia Li, Xuefeng Xiao, Min Zheng, Lean Fu

TL;DR

ResAdapter addresses the challenge of generating high-quality images from personalized diffusion models at resolutions outside their training domain, by introducing a domain-consistent, plug-and-play Resolution Adapter. It combines ResCLoRA (convolution-wise LoRA insertions in downsampler/up sampler blocks) to learn resolution priors for interpolation with ResENorm (selective normalization) to enable extrapolation, all trained via a mixed-resolution strategy on a frozen base model and a lightweight parameter set (~$0.5$M). The approach preserves the original style domain of personalized models while enabling resolution interpolation from $128\times128$ up to $1024\times1024$ (SD1.5) and extrapolation up to $1536\times1536$ (SDXL), and it is compatible with ControlNet, IP-Adapter, LCM-LoRA, and ElasticDiffusion to boost efficiency. Empirical results show improved FID and CLIP scores, favorable human judgments, and notable speedups compared to post-processing multi-resolution methods, enabling practical, high-resolution generation with personalized diffusion models.

Abstract

Recent advancement in text-to-image models (e.g., Stable Diffusion) and corresponding personalized technologies (e.g., DreamBooth and LoRA) enables individuals to generate high-quality and imaginative images. However, they often suffer from limitations when generating images with resolutions outside of their trained domain. To overcome this limitation, we present the Resolution Adapter (ResAdapter), a domain-consistent adapter designed for diffusion models to generate images with unrestricted resolutions and aspect ratios. Unlike other multi-resolution generation methods that process images of static resolution with complex post-process operations, ResAdapter directly generates images with the dynamical resolution. Especially, after learning a deep understanding of pure resolution priors, ResAdapter trained on the general dataset, generates resolution-free images with personalized diffusion models while preserving their original style domain. Comprehensive experiments demonstrate that ResAdapter with only 0.5M can process images with flexible resolutions for arbitrary diffusion models. More extended experiments demonstrate that ResAdapter is compatible with other modules (e.g., ControlNet, IP-Adapter and LCM-LoRA) for image generation across a broad range of resolutions, and can be integrated into other multi-resolution model (e.g., ElasticDiffusion) for efficiently generating higher-resolution images. Project link is https://res-adapter.github.io

ResAdapter: Domain Consistent Resolution Adapter for Diffusion Models

TL;DR

ResAdapter addresses the challenge of generating high-quality images from personalized diffusion models at resolutions outside their training domain, by introducing a domain-consistent, plug-and-play Resolution Adapter. It combines ResCLoRA (convolution-wise LoRA insertions in downsampler/up sampler blocks) to learn resolution priors for interpolation with ResENorm (selective normalization) to enable extrapolation, all trained via a mixed-resolution strategy on a frozen base model and a lightweight parameter set (~M). The approach preserves the original style domain of personalized models while enabling resolution interpolation from up to (SD1.5) and extrapolation up to (SDXL), and it is compatible with ControlNet, IP-Adapter, LCM-LoRA, and ElasticDiffusion to boost efficiency. Empirical results show improved FID and CLIP scores, favorable human judgments, and notable speedups compared to post-processing multi-resolution methods, enabling practical, high-resolution generation with personalized diffusion models.

Abstract

Recent advancement in text-to-image models (e.g., Stable Diffusion) and corresponding personalized technologies (e.g., DreamBooth and LoRA) enables individuals to generate high-quality and imaginative images. However, they often suffer from limitations when generating images with resolutions outside of their trained domain. To overcome this limitation, we present the Resolution Adapter (ResAdapter), a domain-consistent adapter designed for diffusion models to generate images with unrestricted resolutions and aspect ratios. Unlike other multi-resolution generation methods that process images of static resolution with complex post-process operations, ResAdapter directly generates images with the dynamical resolution. Especially, after learning a deep understanding of pure resolution priors, ResAdapter trained on the general dataset, generates resolution-free images with personalized diffusion models while preserving their original style domain. Comprehensive experiments demonstrate that ResAdapter with only 0.5M can process images with flexible resolutions for arbitrary diffusion models. More extended experiments demonstrate that ResAdapter is compatible with other modules (e.g., ControlNet, IP-Adapter and LCM-LoRA) for image generation across a broad range of resolutions, and can be integrated into other multi-resolution model (e.g., ElasticDiffusion) for efficiently generating higher-resolution images. Project link is https://res-adapter.github.io
Paper Structure (22 sections, 1 equation, 6 figures, 5 tables)

This paper contains 22 sections, 1 equation, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Motivation. We explore the domain distribution of images generated by SD1.5 and Dreamlike at resolutions of $256 \times 256$, $512 \times 512$ and $1024 \times 1024$. Dreamlike is the personalized diffusion model based on SD1.5. We find that baselines transform domains at resolutions of $256 \times 256$ and $1024 \times 1024$. The above ResAdapter and LoRA are both trained on the general dataset LAION-5B. ResAdapter keep domain consistent at different resolutions. But LoRA injects style priors from LAION-5B and influences the Dreamlike domain, resulting to low-quality images with the style conflict.
  • Figure 2: Overview of ResAdapter. Left: Pipeline of ResAdapter. ResAdapter based on the frozen base model (e.g., SD or SDXL) learns resolution priors from mixed-resolution general datasets, and can be integrated into arbitrary personalized models to generate multi-resolution images. Right: Architecture comparison between ResAdapter and LoRA. Compared to LoRA, ResAdapter is only inserted to downsampler and upsampler, and it unfreezes the group normalization of resnet blocks.
  • Figure 3: Qualitative results. We compare the multi-resolution images generated by ResAdapter and the personalized models of arbitrary style domains. Left: generation images from ResAdapter integrated into the personalized model. Right: generation images from the original personalized model. Some prompts are edited for clarity.
  • Figure 4: Qualitative results of extended experiments with ResAdapter. Image-to-image tasks with ConrtolNet, the condition is canny images at different resolution. Image variation tasks with IP-Adapter, we resize the input image from 1024x1024 to 256x256. Accelerating text-to-image tasks with LCM-LoRA, we generate images in 4 steps.
  • Figure 5: Ablation studies.Top: We ablate on the modules of ResAdapter. Baseline represents Dreamshaper, which is a personalized diffusion model based on SD1.5. The third column represents only ResCLoRA integrated into the model. The fourth column represents only ResENorm integrated into the model. The fifth column represents both them integrated into the model. Bottom: We ablate on the alpha of ResAdapter $\alpha_r$ from 0 to 1 at lower and higher resolutions.
  • ...and 1 more figures