Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization

Vage Egiazarian; Denis Kuznedelev; Anton Voronov; Ruslan Svirschevski; Michael Goin; Daniil Pavlov; Dan Alistarh; Dmitry Baranchuk

Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization

Vage Egiazarian, Denis Kuznedelev, Anton Voronov, Ruslan Svirschevski, Michael Goin, Daniil Pavlov, Dan Alistarh, Dmitry Baranchuk

TL;DR

This work demonstrates that more versatile vector quantization (VQ) may achieve higher compression rates for large-scale text-to-image diffusion models, and tailor vector-based PTQ methods to recent billion-scale text-to-image models (SDXL and SDXL-Turbo).

Abstract

Text-to-image diffusion models have emerged as a powerful framework for high-quality image generation given textual prompts. Their success has driven the rapid development of production-grade diffusion models that consistently increase in size and already contain billions of parameters. As a result, state-of-the-art text-to-image models are becoming less accessible in practice, especially in resource-limited environments. Post-training quantization (PTQ) tackles this issue by compressing the pretrained model weights into lower-bit representations. Recent diffusion quantization techniques primarily rely on uniform scalar quantization, providing decent performance for the models compressed to 4 bits. This work demonstrates that more versatile vector quantization (VQ) may achieve higher compression rates for large-scale text-to-image diffusion models. Specifically, we tailor vector-based PTQ methods to recent billion-scale text-to-image models (SDXL and SDXL-Turbo), and show that the diffusion models of 2B+ parameters compressed to around 3 bits using VQ exhibit the similar image quality and textual alignment as previous 4-bit compression techniques.

Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization

TL;DR

Abstract

Paper Structure (26 sections, 1 equation, 5 figures, 5 tables, 2 algorithms)

This paper contains 26 sections, 1 equation, 5 figures, 5 tables, 2 algorithms.

Introduction
Contributions.
Related work
Efficient diffusion models.
Model quantization.
Quantization of diffusion models.
Method
Vector Quantization of Text-to-Image Models
Calibrating Vector-Quantized Diffusion Models
Layer-wise calibration.
Global fine-tuning.
Inference procedure
Experiments
Experimental setup
Comparison with baseline methods
...and 11 more sections

Figures (5)

Figure 1: Overview of the proposed layer-wise calibration procedure before fine-tuning.
Figure 2: Qualitative comparison of SDXL compressed with VQDM and the baselines.
Figure 3: Human preference study.Left. Comparison between VQDM and the baselines. Right. Comparison between the quantized and full-precision models.
Figure 4: Qualitative comparison of SDXL-Turbo quantized with VQDM and the full-precision model for different sampling steps.
Figure 5: Side-by-side comparison interface for text-to-image human evaluation.

Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization

TL;DR

Abstract

Accurate Compression of Text-to-Image Diffusion Models via Vector Quantization

Authors

TL;DR

Abstract

Table of Contents

Figures (5)