Table of Contents
Fetching ...

The Good, the Better, and the Best: Improving the Discriminability of Face Embeddings through Attribute-aware Learning

Ana Dias, João Ribeiro Pinto, Hugo Proença, João C. Neves

Abstract

Despite recent advances in face recognition, robust performance remains challenging under large variations in age, pose, and occlusion. A common strategy to address these issues is to guide representation learning with auxiliary supervision from facial attributes, encouraging the visual encoder to focus on identity-relevant regions. However, existing approaches typically rely on heterogeneous and fixed sets of attributes, implicitly assuming equal relevance across attributes. This assumption is suboptimal, as different attributes exhibit varying discriminative power for identity recognition, and some may even introduce harmful biases. In this paper, we propose an attribute-aware face recognition architecture that supervises the learning of facial embeddings using identity class labels, identity-relevant facial attributes, and non-identity-related attributes. Facial attributes are organized into interpretable groups, making it possible to decompose and analyze their individual contributions in a human-understandable manner. Experiments on standard face verification benchmarks demonstrate that joint learning of identity and facial attributes improves the discriminability of face embeddings with two major conclusions: (i) using identity-relevant subsets of facial attributes consistently outperforms supervision with a broader attribute set, and (ii) explicitly forcing embeddings to unlearn non-identity-related attributes yields further performance gains compared to leaving such attributes unsupervised. Additionally, our method serves as a diagnostic tool for assessing the trustworthiness of face recognition encoders by allowing for the measurement of accuracy gains with suppression of non-identity-relevant attributes, with such gains suggesting shortcut learning from redundant attributes associated with each identity.

The Good, the Better, and the Best: Improving the Discriminability of Face Embeddings through Attribute-aware Learning

Abstract

Despite recent advances in face recognition, robust performance remains challenging under large variations in age, pose, and occlusion. A common strategy to address these issues is to guide representation learning with auxiliary supervision from facial attributes, encouraging the visual encoder to focus on identity-relevant regions. However, existing approaches typically rely on heterogeneous and fixed sets of attributes, implicitly assuming equal relevance across attributes. This assumption is suboptimal, as different attributes exhibit varying discriminative power for identity recognition, and some may even introduce harmful biases. In this paper, we propose an attribute-aware face recognition architecture that supervises the learning of facial embeddings using identity class labels, identity-relevant facial attributes, and non-identity-related attributes. Facial attributes are organized into interpretable groups, making it possible to decompose and analyze their individual contributions in a human-understandable manner. Experiments on standard face verification benchmarks demonstrate that joint learning of identity and facial attributes improves the discriminability of face embeddings with two major conclusions: (i) using identity-relevant subsets of facial attributes consistently outperforms supervision with a broader attribute set, and (ii) explicitly forcing embeddings to unlearn non-identity-related attributes yields further performance gains compared to leaving such attributes unsupervised. Additionally, our method serves as a diagnostic tool for assessing the trustworthiness of face recognition encoders by allowing for the measurement of accuracy gains with suppression of non-identity-relevant attributes, with such gains suggesting shortcut learning from redundant attributes associated with each identity.
Paper Structure (16 sections, 9 equations, 2 figures, 4 tables)

This paper contains 16 sections, 9 equations, 2 figures, 4 tables.

Figures (2)

  • Figure 1: From fixed attribute supervision to selective prediction and suppression.(a) Prior multi-task face recognition methods use a fixed attribute list as an auxiliary prediction task alongside identity learning. (b) Our approach organizes attributes into groups and controls their influence during training: groups can be predicted or selectively suppressed, while the identity learning objective remains unchanged.
  • Figure 2: Overview of the proposed attribute-aware face recognition architecture. A shared visual encoder produces an identity embedding used for identity recognition and for attribute-based auxiliary supervision. Attribute groups selected for prediction are supervised directly, while groups selected for suppression are connected through gradient reversal layers (GRL) to discourage attribute-specific information in the identity embedding.