Global Multiple Extraction Network for Low-Resolution Facial Expression Recognition
Jingyi Shi
TL;DR
This work targets the challenge of recognizing facial expressions in low-resolution images by introducing GME-Net, a dual-branch architecture that combines a hybrid attention-based local feature extractor with a multi-scale global feature extractor, guided by attention-similarity knowledge distillation from a high-resolution teacher. The local branch employs Mixed-Attention Blocks with a Depthwise Block Attention Mechanism to capture fine-grained details, while the global branch uses Mixed-Channel Feature Extraction Blocks with a quasi-symmetric design to robustly model global cues. A distillation loss transfers relevant attention information from the HR teacher to the LR student, promoting consistent feature focus across resolutions. Experiments on downsampled benchmarks (RAF-DB, ExpW, FER2013, FERPlus) show GME-Net achieving superior or competitive accuracy with favorable efficiency, indicating improved robustness for LR-FER in practical scenarios.
Abstract
Facial expression recognition, as a vital computer vision task, is garnering significant attention and undergoing extensive research. Although facial expression recognition algorithms demonstrate impressive performance on high-resolution images, their effectiveness tends to degrade when confronted with low-resolution images. We find it is because: 1) low-resolution images lack detail information; 2) current methods complete weak global modeling, which make it difficult to extract discriminative features. To alleviate the above issues, we proposed a novel global multiple extraction network (GME-Net) for low-resolution facial expression recognition, which incorporates 1) a hybrid attention-based local feature extraction module with attention similarity knowledge distillation to learn image details from high-resolution network; 2) a multi-scale global feature extraction module with quasi-symmetric structure to mitigate the influence of local image noise and facilitate capturing global image features. As a result, our GME-Net is capable of extracting expression-related discriminative features. Extensive experiments conducted on several widely-used datasets demonstrate that the proposed GME-Net can better recognize low-resolution facial expression and obtain superior performance than existing solutions.
