Table of Contents
Fetching ...

Neglected Risks: The Disturbing Reality of Children's Images in Datasets and the Urgent Call for Accountability

Carlos Caetano, Gabriel O. dos Santos, Caio Petrucci, Artur Barros, Camila Laranjeira, Leo S. F. Ribeiro, Júlia F. de Mendonça, Jefersson A. dos Santos, Sandra Avila

TL;DR

The paper tackles ethical risks from including children's images in large AI datasets and proposes a Vision-Language Model–based pipeline to detect and remove such images. It evaluates multiple VLMs and prompt styles on the #PraCegoVer and Open Images V7 datasets to maximize recall while managing false positives, highlighting both feasibility and measurement challenges. Key findings show high recall is achievable with careful prompt design, but annotation biases and dataset quality significantly limit generalization, and removing child images may impact downstream tasks. The work serves as a baseline for responsible dataset curation, underscores the need for safeguards, and calls for further research into lightweight methods, bias assessment, and policy-driven approaches to protect children's rights in AI systems.

Abstract

Including children's images in datasets has raised ethical concerns, particularly regarding privacy, consent, data protection, and accountability. These datasets, often built by scraping publicly available images from the Internet, can expose children to risks such as exploitation, profiling, and tracking. Despite the growing recognition of these issues, approaches for addressing them remain limited. We explore the ethical implications of using children's images in AI datasets and propose a pipeline to detect and remove such images. As a use case, we built the pipeline on a Vision-Language Model under the Visual Question Answering task and tested it on the #PraCegoVer dataset. We also evaluate the pipeline on a subset of 100,000 images from the Open Images V7 dataset to assess its effectiveness in detecting and removing images of children. The pipeline serves as a baseline for future research, providing a starting point for more comprehensive tools and methodologies. While we leverage existing models trained on potentially problematic data, our goal is to expose and address this issue. We do not advocate for training or deploying such models, but instead call for urgent community reflection and action to protect children's rights. Ultimately, we aim to encourage the research community to exercise - more than an additional - care in creating new datasets and to inspire the development of tools to protect the fundamental rights of vulnerable groups, particularly children.

Neglected Risks: The Disturbing Reality of Children's Images in Datasets and the Urgent Call for Accountability

TL;DR

The paper tackles ethical risks from including children's images in large AI datasets and proposes a Vision-Language Model–based pipeline to detect and remove such images. It evaluates multiple VLMs and prompt styles on the #PraCegoVer and Open Images V7 datasets to maximize recall while managing false positives, highlighting both feasibility and measurement challenges. Key findings show high recall is achievable with careful prompt design, but annotation biases and dataset quality significantly limit generalization, and removing child images may impact downstream tasks. The work serves as a baseline for responsible dataset curation, underscores the need for safeguards, and calls for further research into lightweight methods, bias assessment, and policy-driven approaches to protect children's rights in AI systems.

Abstract

Including children's images in datasets has raised ethical concerns, particularly regarding privacy, consent, data protection, and accountability. These datasets, often built by scraping publicly available images from the Internet, can expose children to risks such as exploitation, profiling, and tracking. Despite the growing recognition of these issues, approaches for addressing them remain limited. We explore the ethical implications of using children's images in AI datasets and propose a pipeline to detect and remove such images. As a use case, we built the pipeline on a Vision-Language Model under the Visual Question Answering task and tested it on the #PraCegoVer dataset. We also evaluate the pipeline on a subset of 100,000 images from the Open Images V7 dataset to assess its effectiveness in detecting and removing images of children. The pipeline serves as a baseline for future research, providing a starting point for more comprehensive tools and methodologies. While we leverage existing models trained on potentially problematic data, our goal is to expose and address this issue. We do not advocate for training or deploying such models, but instead call for urgent community reflection and action to protect children's rights. Ultimately, we aim to encourage the research community to exercise - more than an additional - care in creating new datasets and to inspire the development of tools to protect the fundamental rights of vulnerable groups, particularly children.

Paper Structure

This paper contains 29 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: The methodology overview is divided into two main steps. (i) Building a Reliable Benchmark illustrates the steps employed to curate a subset of the #PraCegoVer dataset, leveraging metadata to ensure reliable annotations for distinguishing children from adults. (ii) Evaluation and Automated Detection details the pipeline based on VLMs, designed to assess whether an image contains a child.
  • Figure 2: Examples of False Positive (FP) and False Negative (FN) classifications in the #PraCegoVer subset. The FP examples include (a) a woman holding dolls, mistakenly identified as containing a child, and (b) a crowded scene with no children. The FN case (c) depicts a teenager, misclassified as not containing a child.
  • Figure 3: Representative examples of False Positives (FP) and False Negatives (FN) identified in the Open Images V7 subset. (a) Mislabeled or missing annotations. (b) Crowded scenes with numerous individuals frequently resulted in FP classifications. (c) Teenagers were challenging to classify due to their resemblance to adults, leading to FN errors. (d) Double annotations, where the same individual was labeled as both "Boy" and "Man" or "Girl" and "Woman". To protect privacy, the faces of all individuals in the images were pixelated. These examples highlight annotation inconsistencies and dataset complexities that influenced the model performance.