Table of Contents
Fetching ...

A Survey on Pedophile Attribution Techniques for Online Platforms

Hiba Fallatah, Ching Suen, Olga Ormandjieva

TL;DR

This survey analyzes text-based attribution methods for online pedophiles on social media, focusing on how candidate size and text length affect performance. It categorizes datasets, feature types, classification approaches, and evaluation metrics used in the field, noting the predominance of computational performance and the scarcity of real-victim data. It discusses methodological choices (profile-based vs instance-based features, lexical/syntactic features, and open-set verification) and compares supervised, unsupervised, and distance-based methods. It also outlines open research problems and advocates for multi-modal data, realistic datasets, and explainable AI to support safe, lawful deployment.

Abstract

Reliance on anonymity in social media has increased its popularity on these platforms among all ages. The availability of public Wi-Fi networks has facilitated a vast variety of online content, including social media applications. Although anonymity and ease of access can be a convenient means of communication for their users, it is difficult to manage and protect its vulnerable users against sexual predators. Using an automated identification system that can attribute predators to their text would make the solution more attainable. In this survey, we provide a review of the methods of pedophile attribution used in social media platforms. We examine the effect of the size of the suspect set and the length of the text on the task of attribution. Moreover, we review the most-used datasets, features, classification techniques and performance measures for attributing sexual predators. We found that few studies have proposed tools to mitigate the risk of online sexual predators, but none of them can provide suspect attribution. Finally, we list several open research problems.

A Survey on Pedophile Attribution Techniques for Online Platforms

TL;DR

This survey analyzes text-based attribution methods for online pedophiles on social media, focusing on how candidate size and text length affect performance. It categorizes datasets, feature types, classification approaches, and evaluation metrics used in the field, noting the predominance of computational performance and the scarcity of real-victim data. It discusses methodological choices (profile-based vs instance-based features, lexical/syntactic features, and open-set verification) and compares supervised, unsupervised, and distance-based methods. It also outlines open research problems and advocates for multi-modal data, realistic datasets, and explainable AI to support safe, lawful deployment.

Abstract

Reliance on anonymity in social media has increased its popularity on these platforms among all ages. The availability of public Wi-Fi networks has facilitated a vast variety of online content, including social media applications. Although anonymity and ease of access can be a convenient means of communication for their users, it is difficult to manage and protect its vulnerable users against sexual predators. Using an automated identification system that can attribute predators to their text would make the solution more attainable. In this survey, we provide a review of the methods of pedophile attribution used in social media platforms. We examine the effect of the size of the suspect set and the length of the text on the task of attribution. Moreover, we review the most-used datasets, features, classification techniques and performance measures for attributing sexual predators. We found that few studies have proposed tools to mitigate the risk of online sexual predators, but none of them can provide suspect attribution. Finally, we list several open research problems.
Paper Structure (15 sections, 3 figures, 2 tables)

This paper contains 15 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Architecture of authorship attribution.
  • Figure 2: Architecture of profile-based approach.
  • Figure 3: Architecture of instance-based approaches.