Table of Contents
Fetching ...

Arbitrary-Resolution and Arbitrary-Scale Face Super-Resolution with Implicit Representation Networks

Yi Ting Tsai, Yu Wei Chen, Hong-Han Shuai, Ching-Chun Huang

TL;DR

ARASFSR tackles fixed-scale and input-resolution sensitivity in face super-resolution by introducing an implicit representation framework that supports arbitrary-resolution and arbitrary-scale outputs. It predicts per-pixel RGB values using 2D deep features, local coordinates, and scale ratios, augmented by a Local Frequency Estimation Module and a Global Coordinate Modulation Module to capture high-frequency texture and leverage facial priors. The approach combines feature unfolding, local ensemble, and a skip connection to ensure robustness across varying inputs, demonstrated on multiple datasets with compelling comparisons to INR-based SISR and conventional FSR methods. Results show strong generalization to unseen scales and real-world artifacts, highlighting ARASFSR’s practical potential for diverse face-centric applications.

Abstract

Face super-resolution (FSR) is a critical technique for enhancing low-resolution facial images and has significant implications for face-related tasks. However, existing FSR methods are limited by fixed up-sampling scales and sensitivity to input size variations. To address these limitations, this paper introduces an Arbitrary-Resolution and Arbitrary-Scale FSR method with implicit representation networks (ARASFSR), featuring three novel designs. First, ARASFSR employs 2D deep features, local relative coordinates, and up-sampling scale ratios to predict RGB values for each target pixel, allowing super-resolution at any up-sampling scale. Second, a local frequency estimation module captures high-frequency facial texture information to reduce the spectral bias effect. Lastly, a global coordinate modulation module guides FSR to leverage prior facial structure knowledge and achieve resolution adaptation effectively. Quantitative and qualitative evaluations demonstrate the robustness of ARASFSR over existing state-of-the-art methods while super-resolving facial images across various input sizes and up-sampling scales.

Arbitrary-Resolution and Arbitrary-Scale Face Super-Resolution with Implicit Representation Networks

TL;DR

ARASFSR tackles fixed-scale and input-resolution sensitivity in face super-resolution by introducing an implicit representation framework that supports arbitrary-resolution and arbitrary-scale outputs. It predicts per-pixel RGB values using 2D deep features, local coordinates, and scale ratios, augmented by a Local Frequency Estimation Module and a Global Coordinate Modulation Module to capture high-frequency texture and leverage facial priors. The approach combines feature unfolding, local ensemble, and a skip connection to ensure robustness across varying inputs, demonstrated on multiple datasets with compelling comparisons to INR-based SISR and conventional FSR methods. Results show strong generalization to unseen scales and real-world artifacts, highlighting ARASFSR’s practical potential for diverse face-centric applications.

Abstract

Face super-resolution (FSR) is a critical technique for enhancing low-resolution facial images and has significant implications for face-related tasks. However, existing FSR methods are limited by fixed up-sampling scales and sensitivity to input size variations. To address these limitations, this paper introduces an Arbitrary-Resolution and Arbitrary-Scale FSR method with implicit representation networks (ARASFSR), featuring three novel designs. First, ARASFSR employs 2D deep features, local relative coordinates, and up-sampling scale ratios to predict RGB values for each target pixel, allowing super-resolution at any up-sampling scale. Second, a local frequency estimation module captures high-frequency facial texture information to reduce the spectral bias effect. Lastly, a global coordinate modulation module guides FSR to leverage prior facial structure knowledge and achieve resolution adaptation effectively. Quantitative and qualitative evaluations demonstrate the robustness of ARASFSR over existing state-of-the-art methods while super-resolving facial images across various input sizes and up-sampling scales.

Paper Structure

This paper contains 19 sections, 13 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Comparison of our proposed arbitrary-resolution and arbitrary-scale face super-resolution (ARASFSR) method with single image super-resolution (SISR) methods, face super-resolution (FSR) methods, and implicit neural representation (INR) based SISR methods. The figure highlights the differences in patch-based training in SISR versus the need for the whole low-resolution face image as input to match the test-time input distribution in FSR, resulting in a global approach instead of SISR's local view. Conventional FSR methods have limited practical applications due to their fixed up-sampling scales and sensitivity to input resolution variations, while applying INR-based SISR methods to face images is not straightforward due to the lack of a global view. To address these issues, we propose ARASFSR, which utilizes implicit representation networks to produce facial images of any resolution and can handle changes in input resolution.
  • Figure 2: Overall architecture of the proposed framework.
  • Figure 3: Visual comparison on CelebAHQ with INR-based SISR methods.
  • Figure 4: Visual comparison of real world case on CelebAHQ-NN-JPEG with INR-based SISR methods.
  • Figure 5: Visual comparison of real-world case on SCface with INR-based SISR methods. Please zoom in for details.
  • ...and 2 more figures