Table of Contents
Fetching ...

QuantFace: Efficient Quantization for Face Restoration

Jiatong Li, Libo Zhu, Haotong Qin, Jingkai Wang, Linghe Kong, Guihai Chen, Yulun Zhang, Xiaokang Yang

TL;DR

QuantFace tackles the challenge of deploying diffusion-based one-step face restoration on resource-constrained devices by introducing a cohesive low-bit quantization framework. It combines rotation-scaling channel balancing to stabilize activation distributions, a quantization-distillation low-rank adaptation (QD-LoRA) to align quantized and full-precision models, and adaptive bit-width allocation via mixed-integer programming to efficiently allocate bits. The method achieves 4–6 bit quantization with substantial parameter and compute reductions (e.g., ~84.85% and ~82.91% respectively at 4-bit) while surpassing prior quantization approaches on both synthetic and real-world face datasets. These contributions enable high-quality face restoration on edge devices and broaden the practical adoption of OSDFR models. The results demonstrate that QuantFace maintains facial detail (eyes, hair) and structural fidelity under aggressive quantization, supporting real-time or on-device deployment without large sacrifices in perceptual quality.

Abstract

Diffusion models have been achieving remarkable performance in face restoration. However, the heavy computations hamper the widespread adoption of these models. In this work, we propose QuantFace, a novel low-bit quantization framework for face restoration models, where the full-precision (i.e., 32-bit) weights and activations are quantized to 4~6-bit. We first analyze the data distribution within activations and find that it is highly variant. To preserve the original data information, we employ rotation-scaling channel balancing. Furthermore, we propose Quantization-Distillation Low-Rank Adaptation (QD-LoRA), which jointly optimizes for quantization and distillation performance. Finally, we propose an adaptive bit-width allocation strategy. We formulate such a strategy as an integer programming problem that combines quantization error and perceptual metrics to find a satisfactory resource allocation. Extensive experiments on the synthetic and real-world datasets demonstrate the effectiveness of QuantFace under 6-bit and 4-bit. QuantFace achieves significant advantages over recent leading low-bit quantization methods for face restoration. The code is available at https://github.com/jiatongli2024/QuantFace.

QuantFace: Efficient Quantization for Face Restoration

TL;DR

QuantFace tackles the challenge of deploying diffusion-based one-step face restoration on resource-constrained devices by introducing a cohesive low-bit quantization framework. It combines rotation-scaling channel balancing to stabilize activation distributions, a quantization-distillation low-rank adaptation (QD-LoRA) to align quantized and full-precision models, and adaptive bit-width allocation via mixed-integer programming to efficiently allocate bits. The method achieves 4–6 bit quantization with substantial parameter and compute reductions (e.g., ~84.85% and ~82.91% respectively at 4-bit) while surpassing prior quantization approaches on both synthetic and real-world face datasets. These contributions enable high-quality face restoration on edge devices and broaden the practical adoption of OSDFR models. The results demonstrate that QuantFace maintains facial detail (eyes, hair) and structural fidelity under aggressive quantization, supporting real-time or on-device deployment without large sacrifices in perceptual quality.

Abstract

Diffusion models have been achieving remarkable performance in face restoration. However, the heavy computations hamper the widespread adoption of these models. In this work, we propose QuantFace, a novel low-bit quantization framework for face restoration models, where the full-precision (i.e., 32-bit) weights and activations are quantized to 4~6-bit. We first analyze the data distribution within activations and find that it is highly variant. To preserve the original data information, we employ rotation-scaling channel balancing. Furthermore, we propose Quantization-Distillation Low-Rank Adaptation (QD-LoRA), which jointly optimizes for quantization and distillation performance. Finally, we propose an adaptive bit-width allocation strategy. We formulate such a strategy as an integer programming problem that combines quantization error and perceptual metrics to find a satisfactory resource allocation. Extensive experiments on the synthetic and real-world datasets demonstrate the effectiveness of QuantFace under 6-bit and 4-bit. QuantFace achieves significant advantages over recent leading low-bit quantization methods for face restoration. The code is available at https://github.com/jiatongli2024/QuantFace.

Paper Structure

This paper contains 26 sections, 14 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Visual comparison between the multi-step diffusion and one-step diffusion face restoration models in full-precision, recent quantization methods in 4-bit, and our QuantFace in 4-bit. Our method achieves an 84.85% parameter compression and an 82.91% speedup compared with the full-precision OSDFace osdface.
  • Figure 2: The original activation has high variance. Our QuantFace can smooth activation distribution and reduce quantization error.
  • Figure 3: Overview of our QuantFace. First, under the 4-bit precision setting, we use the quantization errors and perceptual importance weights as the objective for integer programming, and allocate appropriate precision to activation of each layer. Second, before training, we integrate the scaling factor and rotation matrix into the weights, and only apply an online RHT in the convolution layers. Third, we align the quantized model with the FP model on the calibration dataset by optimizing the dual-branch low-rank matrices we design.
  • Figure 5: The change in FID on FFHQ when quantizing the activation of each layer individually at 4-bit precision. Different layers exhibit varying levels of sensitivity to quantization. Downsampling, upsampling, and residual connections are bottlenecks for quantization.
  • Figure 6: Visual comparison of the synthetic CelebA-Test dataset in challenging cases.
  • ...and 6 more figures