Table of Contents
Fetching ...

A Multi-view Mask Contrastive Learning Graph Convolutional Neural Network for Age Estimation

Yiping Zhang, Yuntao Shou, Tao Meng, Wei Ai, Keqin Li

TL;DR

This work tackles facial age estimation under real-world variability by introducing MMCL-GCN, a graph-based two-stage framework. It combines a multi-view mask contrastive learning (MMCL) feature extractor with a graph neural encoder-decoder architecture and an asymmetric siamese setup, followed by an ML-IELM classifier/regressor for age grouping and precise age prediction. The approach leverages both reconstruction and contrastive objectives, formalized as $L_{MC}=\mu L_{rc}+(1-\mu)L_{cl}$ with detailed expressions for $L_{rc}$ and $L_{cl}$, and demonstrates superior performance on MORPH-II, Adience, and LAP-2016 datasets with robust generalization. The results indicate that integrating graph-based representations, masked modeling, and contrastive learning yields more accurate and resilient age estimation, with potential applicability to other vision tasks requiring complex structural understanding.

Abstract

The age estimation task aims to use facial features to predict the age of people and is widely used in public security, marketing, identification, and other fields. However, the features are mainly concentrated in facial keypoints, and existing CNN and Transformer-based methods have inflexibility and redundancy for modeling complex irregular structures. Therefore, this paper proposes a Multi-view Mask Contrastive Learning Graph Convolutional Neural Network (MMCL-GCN) for age estimation. Specifically, the overall structure of the MMCL-GCN network contains a feature extraction stage and an age estimation stage. In the feature extraction stage, we introduce a graph structure to construct face images as input and then design a Multi-view Mask Contrastive Learning (MMCL) mechanism to learn complex structural and semantic information about face images. The learning mechanism employs an asymmetric siamese network architecture, which utilizes an online encoder-decoder structure to reconstruct the missing information from the original graph and utilizes the target encoder to learn latent representations for contrastive learning. Furthermore, to promote the two learning mechanisms better compatible and complementary, we adopt two augmentation strategies and optimize the joint losses. In the age estimation stage, we design a Multi-layer Extreme Learning Machine (ML-IELM) with identity mapping to fully use the features extracted by the online encoder. Then, a classifier and a regressor were constructed based on ML-IELM, which were used to identify the age grouping interval and accurately estimate the final age. Extensive experiments show that MMCL-GCN can effectively reduce the error of age estimation on benchmark datasets such as Adience, MORPH-II, and LAP-2016.

A Multi-view Mask Contrastive Learning Graph Convolutional Neural Network for Age Estimation

TL;DR

This work tackles facial age estimation under real-world variability by introducing MMCL-GCN, a graph-based two-stage framework. It combines a multi-view mask contrastive learning (MMCL) feature extractor with a graph neural encoder-decoder architecture and an asymmetric siamese setup, followed by an ML-IELM classifier/regressor for age grouping and precise age prediction. The approach leverages both reconstruction and contrastive objectives, formalized as with detailed expressions for and , and demonstrates superior performance on MORPH-II, Adience, and LAP-2016 datasets with robust generalization. The results indicate that integrating graph-based representations, masked modeling, and contrastive learning yields more accurate and resilient age estimation, with potential applicability to other vision tasks requiring complex structural understanding.

Abstract

The age estimation task aims to use facial features to predict the age of people and is widely used in public security, marketing, identification, and other fields. However, the features are mainly concentrated in facial keypoints, and existing CNN and Transformer-based methods have inflexibility and redundancy for modeling complex irregular structures. Therefore, this paper proposes a Multi-view Mask Contrastive Learning Graph Convolutional Neural Network (MMCL-GCN) for age estimation. Specifically, the overall structure of the MMCL-GCN network contains a feature extraction stage and an age estimation stage. In the feature extraction stage, we introduce a graph structure to construct face images as input and then design a Multi-view Mask Contrastive Learning (MMCL) mechanism to learn complex structural and semantic information about face images. The learning mechanism employs an asymmetric siamese network architecture, which utilizes an online encoder-decoder structure to reconstruct the missing information from the original graph and utilizes the target encoder to learn latent representations for contrastive learning. Furthermore, to promote the two learning mechanisms better compatible and complementary, we adopt two augmentation strategies and optimize the joint losses. In the age estimation stage, we design a Multi-layer Extreme Learning Machine (ML-IELM) with identity mapping to fully use the features extracted by the online encoder. Then, a classifier and a regressor were constructed based on ML-IELM, which were used to identify the age grouping interval and accurately estimate the final age. Extensive experiments show that MMCL-GCN can effectively reduce the error of age estimation on benchmark datasets such as Adience, MORPH-II, and LAP-2016.
Paper Structure (20 sections, 23 equations, 9 figures, 5 tables)

This paper contains 20 sections, 23 equations, 9 figures, 5 tables.

Figures (9)

  • Figure 1: The illustration of CNN, Transformer, and GNN on image representation. a) CNN uses convolution operators to extract features on images with grid structure. b) Transformer uses the attention mechanism to extract features on images with sequence structure. c) GNN uses the information aggregation mechanism to extract features on images with graph structure. The graph structure encompass sequences and grids as varied instances capturing long-range contextual information. Therefore, GNN extracts image features with more flexibility and little redundancy.
  • Figure 2: The overall framework of MMCL-GCN contains two stages: feature extraction stage and age estimation stage. In the feature extraction stage, we introduce a graph structure to construct face images as input and then design a Multi-view Mask Contrastive Learning (MMCL) mechanism to learn complex structural and semantic information about face images. MMCL utilizes an online encoder-decoder structure to reconstruct the masked patches from the original graph and a target encoder for contrastive learning. Moreover, it adopts the graph augmentation strategies and optimizes the joint losses to promote better compatible and complementary.
  • Figure 3: The examples of our major graph augmentation methods, including Edge Dropping,and Node Masking.
  • Figure 4: The illustration of training objective. MMGL adopts an asymmetric siamese network, utilizes graph augmentation methods and minimizes the joint losses to promote organic combination of the two mechanisms.
  • Figure 5: The illustration of multilayer ELM based on identity mapping.
  • ...and 4 more figures