Table of Contents
Fetching ...

Hybrid SD: Edge-Cloud Collaborative Inference for Stable Diffusion Models

Chenqian Yan, Songwei Liu, Hongjian Liu, Xurui Peng, Xiaojian Wang, Fangmin Chen, Lean Fu, Xing Mei

TL;DR

This work introduces Hybrid SD, an innovative, training-free SDMs inference framework designed for edge-cloud collaborative inference, which achieves state-of-the-art parameter efficiency and reduces the cloud cost by 66% with edge-cloud collaborative inference.

Abstract

Stable Diffusion Models (SDMs) have shown remarkable proficiency in image synthesis. However, their broad application is impeded by their large model sizes and intensive computational requirements, which typically require expensive cloud servers for deployment. On the flip side, while there are many compact models tailored for edge devices that can reduce these demands, they often compromise on semantic integrity and visual quality when compared to full-sized SDMs. To bridge this gap, we introduce Hybrid SD, an innovative, training-free SDMs inference framework designed for edge-cloud collaborative inference. Hybrid SD distributes the early steps of the diffusion process to the large models deployed on cloud servers, enhancing semantic planning. Furthermore, small efficient models deployed on edge devices can be integrated for refining visual details in the later stages. Acknowledging the diversity of edge devices with differing computational and storage capacities, we employ structural pruning to the SDMs U-Net and train a lightweight VAE. Empirical evaluations demonstrate that our compressed models achieve state-of-the-art parameter efficiency (225.8M) on edge devices with competitive image quality. Additionally, Hybrid SD reduces the cloud cost by 66% with edge-cloud collaborative inference.

Hybrid SD: Edge-Cloud Collaborative Inference for Stable Diffusion Models

TL;DR

This work introduces Hybrid SD, an innovative, training-free SDMs inference framework designed for edge-cloud collaborative inference, which achieves state-of-the-art parameter efficiency and reduces the cloud cost by 66% with edge-cloud collaborative inference.

Abstract

Stable Diffusion Models (SDMs) have shown remarkable proficiency in image synthesis. However, their broad application is impeded by their large model sizes and intensive computational requirements, which typically require expensive cloud servers for deployment. On the flip side, while there are many compact models tailored for edge devices that can reduce these demands, they often compromise on semantic integrity and visual quality when compared to full-sized SDMs. To bridge this gap, we introduce Hybrid SD, an innovative, training-free SDMs inference framework designed for edge-cloud collaborative inference. Hybrid SD distributes the early steps of the diffusion process to the large models deployed on cloud servers, enhancing semantic planning. Furthermore, small efficient models deployed on edge devices can be integrated for refining visual details in the later stages. Acknowledging the diversity of edge devices with differing computational and storage capacities, we employ structural pruning to the SDMs U-Net and train a lightweight VAE. Empirical evaluations demonstrate that our compressed models achieve state-of-the-art parameter efficiency (225.8M) on edge devices with competitive image quality. Additionally, Hybrid SD reduces the cloud cost by 66% with edge-cloud collaborative inference.
Paper Structure (15 sections, 6 equations, 11 figures, 3 tables, 1 algorithm)

This paper contains 15 sections, 6 equations, 11 figures, 3 tables, 1 algorithm.

Figures (11)

  • Figure 1: Comparisons of FLOPs, model size (#Params) and FID score on MS-COCO 2014 30K dataset lin2014microsoft. We report FLOPs and parameters of the U-Net and VAE decoder for each model. Our proposed Hybrid SD is highlighted in red font, where $k$ indicates the number of steps running on cloud servers. For all the SDMs, we deploy a 25-step DPMSolver lu2022dpm sampler. Hybrid SD achieves the compelling FID with minimal parameters and computational costs. The region between the two dashed lines represents the accelerated LCM models with 8-step sampling by default. Hybrid SD shows exceptional compatibility with accelerated models. * represents replacing the original VAE with our lightweight VAE on edge devices.
  • Figure 2: The overview of Hybrid SD. We distribute the inference tasks to cloud servers and edge devices. The red line denotes text-to-image tasks while the blue line denotes image-to-image tasks.
  • Figure 3: Illustration of different SDMs inference process. (a) Large SD model inference on cloud. (b) Small SD model inference on edge. (c) Hybrid SD inference in a edge-cloud collaborative manner.
  • Figure 4: (a) Different impact of pruning 50% parameters in BK-SDM-Small without fine-tuning. (b) Evaluation score. the higher the more important.
  • Figure 5: Visualizations of images generated by SD-v1.4 VAE (left), TAESD (middle), and ours VAE(right). The first row shows images reconstructed directly by VAE while the second row denotes images decoded from the latent generated by SD-v1.4 LCM. Our VAE shows competitive performance compared to SD-v1.4 VAE while excelling TAESD in terms of detail generation and color saturation.
  • ...and 6 more figures