Table of Contents
Fetching ...

Efficient Face Super-Resolution via Wavelet-based Feature Enhancement Network

Wenjie Li, Heng Guo, Xuannan Liu, Kongming Liang, Jiani Hu, Zhanyu Ma, Jun Guo

TL;DR

This work tackles face super-resolution (FSR) by mitigating distortions caused by downsampling in encoder–decoder models. It introduces WFEN, a wavelet-based feature enhancement network that uses discrete wavelet transform to split features into low- and high-frequency bands and a full-domain Transformer (FDT) to comprehensively capture local, regional, and global facial information. The main contributions are the Wavelet Feature Downsample (WFD) and Wavelet Feature Upgrade (WFU) for distortion-free downsampling and upsampling, and the Full-domain Transformer (FDT) for cross-scale feature fusion, achieving a favorable balance between performance, model size, and speed. Extensive experiments on CelebA, Helen, and SCface demonstrate improved fidelity (PSNR/SSIM/LPIPS/VIF) and identity preservation with lower computational cost, highlighting the method’s practical potential for real-world FSR and surveillance applications.

Abstract

Face super-resolution aims to reconstruct a high-resolution face image from a low-resolution face image. Previous methods typically employ an encoder-decoder structure to extract facial structural features, where the direct downsampling inevitably introduces distortions, especially to high-frequency features such as edges. To address this issue, we propose a wavelet-based feature enhancement network, which mitigates feature distortion by losslessly decomposing the input feature into high and low-frequency components using the wavelet transform and processing them separately. To improve the efficiency of facial feature extraction, a full domain Transformer is further proposed to enhance local, regional, and global facial features. Such designs allow our method to perform better without stacking many modules as previous methods did. Experiments show that our method effectively balances performance, model size, and speed. Code link: https://github.com/PRIS-CV/WFEN.

Efficient Face Super-Resolution via Wavelet-based Feature Enhancement Network

TL;DR

This work tackles face super-resolution (FSR) by mitigating distortions caused by downsampling in encoder–decoder models. It introduces WFEN, a wavelet-based feature enhancement network that uses discrete wavelet transform to split features into low- and high-frequency bands and a full-domain Transformer (FDT) to comprehensively capture local, regional, and global facial information. The main contributions are the Wavelet Feature Downsample (WFD) and Wavelet Feature Upgrade (WFU) for distortion-free downsampling and upsampling, and the Full-domain Transformer (FDT) for cross-scale feature fusion, achieving a favorable balance between performance, model size, and speed. Extensive experiments on CelebA, Helen, and SCface demonstrate improved fidelity (PSNR/SSIM/LPIPS/VIF) and identity preservation with lower computational cost, highlighting the method’s practical potential for real-world FSR and surveillance applications.

Abstract

Face super-resolution aims to reconstruct a high-resolution face image from a low-resolution face image. Previous methods typically employ an encoder-decoder structure to extract facial structural features, where the direct downsampling inevitably introduces distortions, especially to high-frequency features such as edges. To address this issue, we propose a wavelet-based feature enhancement network, which mitigates feature distortion by losslessly decomposing the input feature into high and low-frequency components using the wavelet transform and processing them separately. To improve the efficiency of facial feature extraction, a full domain Transformer is further proposed to enhance local, regional, and global facial features. Such designs allow our method to perform better without stacking many modules as previous methods did. Experiments show that our method effectively balances performance, model size, and speed. Code link: https://github.com/PRIS-CV/WFEN.
Paper Structure (21 sections, 8 equations, 9 figures, 5 tables)

This paper contains 21 sections, 8 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: Efficiency trade-offs between ours and existing methods on CelebA liu2015deep test set. Our method achieves a balance in terms of PSNR, model size, and speed.
  • Figure 2: Feature maps (first line) and FSR results (second line) with various downsampling methods: bicubic interpolation, stride convolution, average pooling, and our wavelet feature downsample. The loss of high-frequency features is pronounced in (a) and (b), while frequency-domain feature aliasing appears in (c). Ours is effective in preventing feature loss or frequency-domain aliasing.
  • Figure 3: Overview of our method, where the cascaded of WFD and WFU constitute the wavelet-based encoder-decoder structure.
  • Figure 4: Architecture of our full-domain Transformer, which can handle local, regional, and global facial features.
  • Figure 5: Qualitative comparison for $\times 8$ FSR on CelebA liu2015deep and Helen le2012interactive test sets. Our method recovers detailed face images.
  • ...and 4 more figures