A Multi-view Mask Contrastive Learning Graph Convolutional Neural Network for Age Estimation
Yiping Zhang, Yuntao Shou, Tao Meng, Wei Ai, Keqin Li
TL;DR
This work tackles facial age estimation under real-world variability by introducing MMCL-GCN, a graph-based two-stage framework. It combines a multi-view mask contrastive learning (MMCL) feature extractor with a graph neural encoder-decoder architecture and an asymmetric siamese setup, followed by an ML-IELM classifier/regressor for age grouping and precise age prediction. The approach leverages both reconstruction and contrastive objectives, formalized as $L_{MC}=\mu L_{rc}+(1-\mu)L_{cl}$ with detailed expressions for $L_{rc}$ and $L_{cl}$, and demonstrates superior performance on MORPH-II, Adience, and LAP-2016 datasets with robust generalization. The results indicate that integrating graph-based representations, masked modeling, and contrastive learning yields more accurate and resilient age estimation, with potential applicability to other vision tasks requiring complex structural understanding.
Abstract
The age estimation task aims to use facial features to predict the age of people and is widely used in public security, marketing, identification, and other fields. However, the features are mainly concentrated in facial keypoints, and existing CNN and Transformer-based methods have inflexibility and redundancy for modeling complex irregular structures. Therefore, this paper proposes a Multi-view Mask Contrastive Learning Graph Convolutional Neural Network (MMCL-GCN) for age estimation. Specifically, the overall structure of the MMCL-GCN network contains a feature extraction stage and an age estimation stage. In the feature extraction stage, we introduce a graph structure to construct face images as input and then design a Multi-view Mask Contrastive Learning (MMCL) mechanism to learn complex structural and semantic information about face images. The learning mechanism employs an asymmetric siamese network architecture, which utilizes an online encoder-decoder structure to reconstruct the missing information from the original graph and utilizes the target encoder to learn latent representations for contrastive learning. Furthermore, to promote the two learning mechanisms better compatible and complementary, we adopt two augmentation strategies and optimize the joint losses. In the age estimation stage, we design a Multi-layer Extreme Learning Machine (ML-IELM) with identity mapping to fully use the features extracted by the online encoder. Then, a classifier and a regressor were constructed based on ML-IELM, which were used to identify the age grouping interval and accurately estimate the final age. Extensive experiments show that MMCL-GCN can effectively reduce the error of age estimation on benchmark datasets such as Adience, MORPH-II, and LAP-2016.
