Table of Contents
Fetching ...

Underage Detection through a Multi-Task and MultiAge Approach for Screening Minors in Unconstrained Imagery

Christopher Gaul, Eduardo Fidalgo, Enrique Alegre, Rocío Alaiz Rodríguez, Eri Pérez Corral

TL;DR

The paper tackles robust minor detection and age estimation in unconstrained imagery by proposing a multitask architecture that couples a frozen FaRL backbone with a dedicated MultiAge head. It introduces the Overall Underage Benchmark and ASWIFT-20k wild test to stress domain shifts and real-world conditions, and demonstrates that age-balanced resampling together with an age-gap mechanism improves underage detection and age regression across multiple age thresholds. The approach achieves a strong balance between detection recall and precision, especially under constrained FPR limits, and shows improved performance over baselines on wild data while revealing demographic biases that warrant further data and ethical safeguards. Overall, the work provides a practical framework for robust minor protection in automated systems, highlighting both methodological gains and areas needing further fairness and data augmentation.

Abstract

Accurate automatic screening of minors in unconstrained images requires models robust to distribution shift and resilient to the under-representation of children in public datasets. To address these issues, we propose a multi-task architecture with dedicated under/over-age discrimination tasks based on a frozen FaRL vision-language backbone joined with a compact two-layer MLP that shares features across one age-regression head and four binary underage heads (12, 15, 18, and 21 years). This design focuses on the legally critical age range while keeping the backbone frozen. Class imbalance is mitigated through an $α$-reweighted focal loss and age-balanced mini-batch sampling, while an age gap removes ambiguous samples near thresholds. Evaluation is conducted on our new Overall Underage Benchmark (303k cleaned training images, 110k test images), defining both the "ASORES-39k" restricted overall test, which removes the noisiest domains, and the age estimation wild-shifts test "ASWIFT-20k" of 20k-images, stressing extreme poses ($>$45°), expressions, and low image quality to emulate real-world shifts. Trained on the cleaned overall set with resampling and age gap, our multiage model "F" reduces the mean absolute error on ASORES-39k from 4.175 y (age-only baseline) to 4.068 y and improves under-18 detection from F2 score of 0.801 to 0.857 at 1% false-adult rate. Under the ASWIFT-20k, the same configuration nearly sustains 0.99 recall while F2 rises from 0.742 to 0.833, demonstrating robustness to domain shift.

Underage Detection through a Multi-Task and MultiAge Approach for Screening Minors in Unconstrained Imagery

TL;DR

The paper tackles robust minor detection and age estimation in unconstrained imagery by proposing a multitask architecture that couples a frozen FaRL backbone with a dedicated MultiAge head. It introduces the Overall Underage Benchmark and ASWIFT-20k wild test to stress domain shifts and real-world conditions, and demonstrates that age-balanced resampling together with an age-gap mechanism improves underage detection and age regression across multiple age thresholds. The approach achieves a strong balance between detection recall and precision, especially under constrained FPR limits, and shows improved performance over baselines on wild data while revealing demographic biases that warrant further data and ethical safeguards. Overall, the work provides a practical framework for robust minor protection in automated systems, highlighting both methodological gains and areas needing further fairness and data augmentation.

Abstract

Accurate automatic screening of minors in unconstrained images requires models robust to distribution shift and resilient to the under-representation of children in public datasets. To address these issues, we propose a multi-task architecture with dedicated under/over-age discrimination tasks based on a frozen FaRL vision-language backbone joined with a compact two-layer MLP that shares features across one age-regression head and four binary underage heads (12, 15, 18, and 21 years). This design focuses on the legally critical age range while keeping the backbone frozen. Class imbalance is mitigated through an -reweighted focal loss and age-balanced mini-batch sampling, while an age gap removes ambiguous samples near thresholds. Evaluation is conducted on our new Overall Underage Benchmark (303k cleaned training images, 110k test images), defining both the "ASORES-39k" restricted overall test, which removes the noisiest domains, and the age estimation wild-shifts test "ASWIFT-20k" of 20k-images, stressing extreme poses (45°), expressions, and low image quality to emulate real-world shifts. Trained on the cleaned overall set with resampling and age gap, our multiage model "F" reduces the mean absolute error on ASORES-39k from 4.175 y (age-only baseline) to 4.068 y and improves under-18 detection from F2 score of 0.801 to 0.857 at 1% false-adult rate. Under the ASWIFT-20k, the same configuration nearly sustains 0.99 recall while F2 rises from 0.742 to 0.833, demonstrating robustness to domain shift.

Paper Structure

This paper contains 32 sections, 9 equations, 9 figures, 10 tables.

Figures (9)

  • Figure 1: Sample face patches from the different age estimation benchmarks, as processed by the model during training, i.e., with the random flipping, scaling, and cropping described in Section \ref{['sec_methodology_model']}.
  • Figure 2: The age distribution of the training and test data. (a) Number of images for each source dataset (colors) and for the merged corpus (black). (b) same curves for the whole age range normalized by the respective dataset sizes. (c) Cumulative histogram of the overall test, the ASORES-39k restricted test, and the ASWIFT-20k test and its subsets. Note that the "Pose-45" subset, i.e., those faces with a pose angle beyond 45°, comes mostly from the Dartmouth Dataset of Children's Faces and thus contains mostly children.
  • Figure 3: Distribution of ASORES-39k and ASWIFT-20k over gender, ethnicity (FairFace Karkkainen_2021_FairFace) and age.
  • Figure 4: Selection of facial expressions for ASWIFT-20k. (a) Distribution of the selected "extreme" expressions in arousal-valence space. (b) Example images for some expressions.
  • Figure 5: Scheme of the model architecture: The face patch extracted from the image scene is divided into subpatches ($14\times14$ non-overlapping patches of $16\times16$ pixels), transformed with the FaRL image encoder Zheng_2022_FaRL and then processed by our "MultiAge" network. Apart from an age estimate (with its typical estimation error), several outputs indicate whether the subject is likely to be under a given age threshold. As discussed in Section \ref{['sec_experiments']} below, these underage outputs are more reliable than using the age estimate directly.
  • ...and 4 more figures