Table of Contents
Fetching ...

Linking Multi-Site Sex Ad Data at the Individual Level to Aid Counter-Trafficking Efforts

Nickolas K. Freeman, Gregory J. Bott, Burcu B. Keskin, Jason M. Parton, James J. Cochran

TL;DR

The paper tackles the challenge of linking multimodal sex ad data across multiple ASWs to support counter-trafficking efforts. It introduces an end-to-end ad-linking pipeline that models ads as a graph, detects and filters erroneous connections in the giant component via a data-driven same-user classifier, and outputs actionable intelligence for law enforcement and nonprofits. A key contribution is a no-data-loss giant-component filtering approach built on transformer-based text embeddings and perceptual hashes for images, enabling efficient processing of millions of ads (under an hour) and providing superior performance over prior methods. Validation on an open multi-site dataset demonstrates significant reductions in the noisy giant component, substantial increases in decomposed components, and practical utility in identifying potential trafficking victims, with a strong emphasis on reproducibility and real-world impact.

Abstract

The Internet facilitates sex trafficking through adult service websites (ASWs) that host online advertisements for sexual services (sex ads). Since the closure of the popular site Backpage.com, the ecosystem of ASWs has expanded to include multiple competing sites that are hosted outside US jurisdiction. Gaining intelligence for counter-trafficking efforts requires collecting, linking, and cleaning the data from multiple sites. However, high ad volumes, disparate data types, and the existence of generic and misappropriated data make this process challenging. We present an end-to-end process for linking sex ad data and filtering potentially erroneous links. Outputs of the developed process have been used to inform counter-trafficking operations that have helped identify more than 60 potential victims of sex trafficking, some of whom are getting help to transition out of the life. Our process leverages concepts and techniques from network science, information systems, and artificial intelligence to link ads across sites at the level of an individual or unique posting entity. Our approach is computationally efficient, allowing millions of ads to be processed in under an hour. A key component of our process is an edge filtering procedure that identifies and removes potentially erroneous links in a graph representation of sex ad data. A comparison of the proposed process to an existing approach shows that our process is typically more computationally efficient and yields substantial increases in the number of individuals for which we can derive actionable intelligence. The proposed process is an efficient and effective approach for transforming the high volumes of disparate data from sex ads into intelligence that can save lives. It has been refined over years of collaboration with practitioners and represents a strong foundation upon which further counter-trafficking tools can be built.

Linking Multi-Site Sex Ad Data at the Individual Level to Aid Counter-Trafficking Efforts

TL;DR

The paper tackles the challenge of linking multimodal sex ad data across multiple ASWs to support counter-trafficking efforts. It introduces an end-to-end ad-linking pipeline that models ads as a graph, detects and filters erroneous connections in the giant component via a data-driven same-user classifier, and outputs actionable intelligence for law enforcement and nonprofits. A key contribution is a no-data-loss giant-component filtering approach built on transformer-based text embeddings and perceptual hashes for images, enabling efficient processing of millions of ads (under an hour) and providing superior performance over prior methods. Validation on an open multi-site dataset demonstrates significant reductions in the noisy giant component, substantial increases in decomposed components, and practical utility in identifying potential trafficking victims, with a strong emphasis on reproducibility and real-world impact.

Abstract

The Internet facilitates sex trafficking through adult service websites (ASWs) that host online advertisements for sexual services (sex ads). Since the closure of the popular site Backpage.com, the ecosystem of ASWs has expanded to include multiple competing sites that are hosted outside US jurisdiction. Gaining intelligence for counter-trafficking efforts requires collecting, linking, and cleaning the data from multiple sites. However, high ad volumes, disparate data types, and the existence of generic and misappropriated data make this process challenging. We present an end-to-end process for linking sex ad data and filtering potentially erroneous links. Outputs of the developed process have been used to inform counter-trafficking operations that have helped identify more than 60 potential victims of sex trafficking, some of whom are getting help to transition out of the life. Our process leverages concepts and techniques from network science, information systems, and artificial intelligence to link ads across sites at the level of an individual or unique posting entity. Our approach is computationally efficient, allowing millions of ads to be processed in under an hour. A key component of our process is an edge filtering procedure that identifies and removes potentially erroneous links in a graph representation of sex ad data. A comparison of the proposed process to an existing approach shows that our process is typically more computationally efficient and yields substantial increases in the number of individuals for which we can derive actionable intelligence. The proposed process is an efficient and effective approach for transforming the high volumes of disparate data from sex ads into intelligence that can save lives. It has been refined over years of collaboration with practitioners and represents a strong foundation upon which further counter-trafficking tools can be built.

Paper Structure

This paper contains 24 sections, 10 figures, 6 tables, 1 algorithm.

Figures (10)

  • Figure 1: Ad-Linking Pipeline
  • Figure 2: County Mapping Examples
  • Figure 3: Graph Representation Example
  • Figure 4: Giant Component Filtering - Graph Projection
  • Figure 5: Giant Component Filtering - Edge Removal
  • ...and 5 more figures