Table of Contents
Fetching ...

Hidden in Plain Sight: Detecting Illicit Massage Businesses from Mobility Data

Roya Shomali, Nick Freeman, Greg Bott, Iman Dayarian, Jason Parton

TL;DR

This paper tackles the challenge of detecting illicit massage businesses (IMBs) within dense urban landscapes by leveraging anonymized mobility data and positive-unlabeled (PU) learning to address incomplete ground truth. It builds a three-stage pipeline that extracts 28 mobility-based features, trains a PU Bagging classifier using confirmed illicit weeks (Illicit Active) and unlabeled weeks (Never-ASW), and outputs weekly risk scores that are aggregated to establish high-priority targets for inspection. The approach achieves $\text{AUC}=0.97$ and $\text{AP}=0.84$ at the POI-week level, and, when targeting the top 10% of establishments, captures about $53\%$ of known illicit operations, a substantial efficiency gain over random screening. Four operational signatures emerge—stable demand, evening-dominated visits, short dwell times, and strong local clientele—that differentiate illicit from legitimate venues and are argued to be harder for operators to spoof than online signals. The framework offers a calibrated decision-support tool for law enforcement to optimize resource allocation under budget constraints while highlighting limitations and avenues for future work in real-time monitoring and broader regulatory contexts.

Abstract

Illicit massage businesses (IMBs) masquerade as legitimate massage parlors while facilitating commercial sex and human trafficking. Law enforcement must identify these businesses within a dense population of lawful establishments, but investigative resources are limited and the illicit status of each location is unknown until inspection. Detection methods based on online reviews offer some insight, yet operators can manipulate these signals, leaving covert establishments undetected. IMBs constitute one of the largest segments of indoor sex trafficking in the United States, with an estimated 9,000 establishments. Mobility data offers an alternative to online signals, covering establishments that avoid digital visibility entirely. We derive features from mobility data spanning temporal visitation patterns, dwell times, visitor catchment areas, and demand stability. Because confirmed labels exist only for establishments identified through advertising platforms, we employ positive-unlabeled learning to address the label asymmetry in ground truth. The model achieves 0.97 AUC and 0.84 Average Precision. Four operational signatures characterize high-risk establishments: demand consistency, evening-concentrated visits, compressed service durations, and locally drawn clientele. The model produces risk scores for each business-week observation. Aggregating to the business level, prioritizing the highest-risk 10% of massage establishments captures 53% of known illicit operations, a 5.3-fold improvement over uninformed inspection. We develop a decision-support system that produces calibrated prioritization scores for law enforcement, enabling investigators to concentrate inspections on the highest-risk venues. The operational signatures may resist strategic manipulation because they reflect actual operations rather than online signals that operators can control.

Hidden in Plain Sight: Detecting Illicit Massage Businesses from Mobility Data

TL;DR

This paper tackles the challenge of detecting illicit massage businesses (IMBs) within dense urban landscapes by leveraging anonymized mobility data and positive-unlabeled (PU) learning to address incomplete ground truth. It builds a three-stage pipeline that extracts 28 mobility-based features, trains a PU Bagging classifier using confirmed illicit weeks (Illicit Active) and unlabeled weeks (Never-ASW), and outputs weekly risk scores that are aggregated to establish high-priority targets for inspection. The approach achieves and at the POI-week level, and, when targeting the top 10% of establishments, captures about of known illicit operations, a substantial efficiency gain over random screening. Four operational signatures emerge—stable demand, evening-dominated visits, short dwell times, and strong local clientele—that differentiate illicit from legitimate venues and are argued to be harder for operators to spoof than online signals. The framework offers a calibrated decision-support tool for law enforcement to optimize resource allocation under budget constraints while highlighting limitations and avenues for future work in real-time monitoring and broader regulatory contexts.

Abstract

Illicit massage businesses (IMBs) masquerade as legitimate massage parlors while facilitating commercial sex and human trafficking. Law enforcement must identify these businesses within a dense population of lawful establishments, but investigative resources are limited and the illicit status of each location is unknown until inspection. Detection methods based on online reviews offer some insight, yet operators can manipulate these signals, leaving covert establishments undetected. IMBs constitute one of the largest segments of indoor sex trafficking in the United States, with an estimated 9,000 establishments. Mobility data offers an alternative to online signals, covering establishments that avoid digital visibility entirely. We derive features from mobility data spanning temporal visitation patterns, dwell times, visitor catchment areas, and demand stability. Because confirmed labels exist only for establishments identified through advertising platforms, we employ positive-unlabeled learning to address the label asymmetry in ground truth. The model achieves 0.97 AUC and 0.84 Average Precision. Four operational signatures characterize high-risk establishments: demand consistency, evening-concentrated visits, compressed service durations, and locally drawn clientele. The model produces risk scores for each business-week observation. Aggregating to the business level, prioritizing the highest-risk 10% of massage establishments captures 53% of known illicit operations, a 5.3-fold improvement over uninformed inspection. We develop a decision-support system that produces calibrated prioritization scores for law enforcement, enabling investigators to concentrate inspections on the highest-risk venues. The operational signatures may resist strategic manipulation because they reflect actual operations rather than online signals that operators can control.
Paper Structure (57 sections, 18 equations, 9 figures, 11 tables, 1 algorithm)

This paper contains 57 sections, 18 equations, 9 figures, 11 tables, 1 algorithm.

Figures (9)

  • Figure 1: Data Merging
  • Figure 2: Training Pipeline
  • Figure 3: Geographic Distribution of POI-Weeks
  • Figure 4: Score Distributions by Observation Category
  • Figure 5: Demand Stability Metrics: Difference Between High-Risk and Low-Risk Establishments
  • ...and 4 more figures