Arbitrary-Resolution and Arbitrary-Scale Face Super-Resolution with Implicit Representation Networks
Yi Ting Tsai, Yu Wei Chen, Hong-Han Shuai, Ching-Chun Huang
TL;DR
ARASFSR tackles fixed-scale and input-resolution sensitivity in face super-resolution by introducing an implicit representation framework that supports arbitrary-resolution and arbitrary-scale outputs. It predicts per-pixel RGB values using 2D deep features, local coordinates, and scale ratios, augmented by a Local Frequency Estimation Module and a Global Coordinate Modulation Module to capture high-frequency texture and leverage facial priors. The approach combines feature unfolding, local ensemble, and a skip connection to ensure robustness across varying inputs, demonstrated on multiple datasets with compelling comparisons to INR-based SISR and conventional FSR methods. Results show strong generalization to unseen scales and real-world artifacts, highlighting ARASFSR’s practical potential for diverse face-centric applications.
Abstract
Face super-resolution (FSR) is a critical technique for enhancing low-resolution facial images and has significant implications for face-related tasks. However, existing FSR methods are limited by fixed up-sampling scales and sensitivity to input size variations. To address these limitations, this paper introduces an Arbitrary-Resolution and Arbitrary-Scale FSR method with implicit representation networks (ARASFSR), featuring three novel designs. First, ARASFSR employs 2D deep features, local relative coordinates, and up-sampling scale ratios to predict RGB values for each target pixel, allowing super-resolution at any up-sampling scale. Second, a local frequency estimation module captures high-frequency facial texture information to reduce the spectral bias effect. Lastly, a global coordinate modulation module guides FSR to leverage prior facial structure knowledge and achieve resolution adaptation effectively. Quantitative and qualitative evaluations demonstrate the robustness of ARASFSR over existing state-of-the-art methods while super-resolving facial images across various input sizes and up-sampling scales.
