Table of Contents
Fetching ...

Joint Person Identity, Gender and Age Estimation from Hand Images using Deep Multi-Task Representation Learning

Nathanael L. Baisa

TL;DR

The paper addresses the problem of jointly estimating identity, gender, and age from hand images to assist criminal investigations. It introduces IGAE-Net, a multi-task representation learning framework with three heads (identity, gender, age group) built on multiple backbone architectures (CNNs and transformers) and trained with a joint cross-entropy loss. The method uses label smoothing and a unified objective $L = sum_{l=1}^3 L_{l,xent}$ to train end-to-end, and it is evaluated on the 11k hands dataset, revealing that ConvNeXt-Tiny (CNN) and Swin-T (transformer) offer top performance with strong gender accuracy across sub-datasets. The work demonstrates the feasibility of extracting multiple descriptive attributes from hand imagery for practical use in law enforcement, potentially improving identification and profiling in investigations.

Abstract

In this paper, we propose a multi-task representation learning framework to jointly estimate the identity, gender and age of individuals from their hand images for the purpose of criminal investigations since the hand images are often the only available information in cases of serious crime such as sexual abuse. We investigate different up-to-date deep learning architectures and compare their performance for joint estimation of identity, gender and age from hand images of perpetrators of serious crime. To simplify the age prediction, we create age groups for the age estimation. We make extensive evaluations and comparisons of both convolution-based and transformer-based deep learning architectures on a publicly available 11k hands dataset. Our experimental analysis shows that it is possible to efficiently estimate not only identity but also other attributes such as gender and age of suspects jointly from hand images for criminal investigations, which is crucial in assisting international police forces in the court to identify and convict abusers.

Joint Person Identity, Gender and Age Estimation from Hand Images using Deep Multi-Task Representation Learning

TL;DR

The paper addresses the problem of jointly estimating identity, gender, and age from hand images to assist criminal investigations. It introduces IGAE-Net, a multi-task representation learning framework with three heads (identity, gender, age group) built on multiple backbone architectures (CNNs and transformers) and trained with a joint cross-entropy loss. The method uses label smoothing and a unified objective to train end-to-end, and it is evaluated on the 11k hands dataset, revealing that ConvNeXt-Tiny (CNN) and Swin-T (transformer) offer top performance with strong gender accuracy across sub-datasets. The work demonstrates the feasibility of extracting multiple descriptive attributes from hand imagery for practical use in law enforcement, potentially improving identification and profiling in investigations.

Abstract

In this paper, we propose a multi-task representation learning framework to jointly estimate the identity, gender and age of individuals from their hand images for the purpose of criminal investigations since the hand images are often the only available information in cases of serious crime such as sexual abuse. We investigate different up-to-date deep learning architectures and compare their performance for joint estimation of identity, gender and age from hand images of perpetrators of serious crime. To simplify the age prediction, we create age groups for the age estimation. We make extensive evaluations and comparisons of both convolution-based and transformer-based deep learning architectures on a publicly available 11k hands dataset. Our experimental analysis shows that it is possible to efficiently estimate not only identity but also other attributes such as gender and age of suspects jointly from hand images for criminal investigations, which is crucial in assisting international police forces in the court to identify and convict abusers.
Paper Structure (9 sections, 3 equations, 4 figures, 4 tables)

This paper contains 9 sections, 3 equations, 4 figures, 4 tables.

Figures (4)

  • Figure 1: Structure of IGAE-Net. Given an input hand image, feature vector f is obtained by passing it through the backbone network and then is fed into each head (identity head, gender head and age head). The identity head, gender head and age head predict the identity, gender and age group of the input image, respectively.
  • Figure 2: Age statistics for right dorsal of the 11k hands dataset Mah19: (a) Age distribution, (b) Age group distribution. The number of images per age or age group is shown.
  • Figure 3: Confusion matrices on right dorsal of 11k hands dataset Mah19 using Swin-T ZeYutYue21-based IGAE-Net: (a) confusion matrix for identity outputs, (b) confusion matrix for gender outputs, (c) confusion matrix for outputs of age groups.
  • Figure 4: Some qualitative results of our proposed method on right dorsal of 11k hands dataset Mah19 using Swin-T ZeYutYue21-based IGAE-Net. The ground truth labels (GT) vs the predicted labels (PR) of identity, gender and age group of each hand image, respectively, are shown.