AffectSRNet : Facial Emotion-Aware Super-Resolution Network
Syed Sameen Ahmad Rizvi, Soham Kumar, Aryan Seth, Pratik Narang
TL;DR
This work tackles the challenge of degraded facial emotion recognition in low-resolution imagery by introducing AffectSRNet, a facial emotion-aware super-resolution framework that preserves expressions during upscaling. It combines a RRDB-based SR backbone with a Graph Convolutional Network over 478 facial landmarks and a Multimodal Split Attention Fusion module to inject structural landmark information into the reconstruction process. A dedicated Emotion Consistency Metric (ECM), defined as $ECM = \alpha L_H + \log(L_{\text{conf}})$ with $\alpha = 0.5$, alongside a multi-term loss that blends pixel, perceptual (style), and graph-based constraints, guides the model toward both high visual quality and expression fidelity. Experimental results on CelebA, FFHQ, and Helen show competitive image quality metrics and superior emotion fidelity (ECM) compared with state-of-the-art FSR methods, indicating strong potential for practical FER deployment in suboptimal resolution environments.
Abstract
Facial expression recognition (FER) systems in low-resolution settings face significant challenges in accurately identifying expressions due to the loss of fine-grained facial details. This limitation is especially problematic for applications like surveillance and mobile communications, where low image resolution is common and can compromise recognition accuracy. Traditional single-image face super-resolution (FSR) techniques, however, often fail to preserve the emotional intent of expressions, introducing distortions that obscure the original affective content. Given the inherently ill-posed nature of single-image super-resolution, a targeted approach is required to balance image quality enhancement with emotion retention. In this paper, we propose AffectSRNet, a novel emotion-aware super-resolution framework that reconstructs high-quality facial images from low-resolution inputs while maintaining the intensity and fidelity of facial expressions. Our method effectively bridges the gap between image resolution and expression accuracy by employing an expression-preserving loss function, specifically tailored for FER applications. Additionally, we introduce a new metric to assess emotion preservation in super-resolved images, providing a more nuanced evaluation of FER system performance in low-resolution scenarios. Experimental results on standard datasets, including CelebA, FFHQ, and Helen, demonstrate that AffectSRNet outperforms existing FSR approaches in both visual quality and emotion fidelity, highlighting its potential for integration into practical FER applications. This work not only improves image clarity but also ensures that emotion-driven applications retain their core functionality in suboptimal resolution environments, paving the way for broader adoption in FER systems.
