TransFace++: Rethinking the Face Recognition Paradigm with a Focus on Accuracy, Efficiency, and Security
Jun Dan, Yang Liu, Baigui Sun, Jiankang Deng, Shan Luo
TL;DR
This paper addresses three FR challenges: (1) CNNs' limited global feature modeling, (2) RGB decoding bottlenecks harming efficiency, and (3) privacy risks from raw RGB inputs. It introduces TransFace, a ViT-based FR backbone with patch-level DPAP and entropy-driven EHSM to improve accuracy and robustness, and TransFace++, a privacy-preserving variant that operates directly on image bytes using Topology-based Image Bytes Compression (TIBC) and Structure Information-guided Cross-Attention (SICA). The results show TransFace achieves competitive or superior performance to RGB-based and ViT baselines on major benchmarks, while TransFace++ delivers strong FR accuracy from encrypted bytes and demonstrates potential for privacy-preserving deployment. Collectively, these methods advance FR by boosting accuracy, efficiency, and security, and open avenues for byte-based FR pipelines and privacy-preserving architectures, with optimization objectives including $\mathcal{L}_{cls}^{trans}$ and $\mathcal{L}_{cls}^{byte}$ for learning from RGB patches and image bytes, respectively.
Abstract
Face Recognition (FR) technology has made significant strides with the emergence of deep learning. Typically, most existing FR models are built upon Convolutional Neural Networks (CNN) and take RGB face images as the model's input. In this work, we take a closer look at existing FR paradigms from high-efficiency, security, and precision perspectives, and identify the following three problems: (i) CNN frameworks are vulnerable in capturing global facial features and modeling the correlations between local facial features. (ii) Selecting RGB face images as the model's input greatly degrades the model's inference efficiency, increasing the extra computation costs. (iii) In the real-world FR system that operates on RGB face images, the integrity of user privacy may be compromised if hackers successfully penetrate and gain access to the input of this model. To solve these three issues, we propose two novel FR frameworks, i.e., TransFace and TransFace++, which successfully explore the feasibility of applying ViTs and image bytes to FR tasks, respectively. Experiments on popular face benchmarks demonstrate the superiority of our TransFace and TransFace++. Code is available at https://github.com/DanJun6737/TransFace_pp.
