Table of Contents
Fetching ...

Emergent AI Surveillance: Overlearned Person Re-Identification and Its Mitigation in Law Enforcement Context

An Thi Nguyen, Radina Stoykova, Eric Arazo

TL;DR

This work demonstrates that generic instance search models trained on non-human data can develop emergent person re-identification capabilities, posing privacy and governance challenges. It introduces two mitigations—index exclusion and confusion loss—and shows that, in combination, they can reduce person re-ID accuracy to below 2% while preserving non-person retrieval substantially. However, the defenses remain vulnerable to circumvention using partial-person queries and accessories, underscoring weaknesses in current safeguards. The study highlights urgent regulatory questions about classifying and controlling systems with emergent identification capabilities and the need for robust technical standards to prevent function creep in AI-enabled surveillance.

Abstract

Generic instance search models can dramatically reduce the manual effort required to analyze vast surveillance footage during criminal investigations by retrieving specific objects of interest to law enforcement. However, our research reveals an unintended emergent capability: through overlearning, these models can single out specific individuals even when trained on datasets without human subjects. This capability raises concerns regarding identification and profiling of individuals based on their personal data, while there is currently no clear standard on how de-identification can be achieved. We evaluate two technical safeguards to curtail a model's person re-identification capacity: index exclusion and confusion loss. Our experiments demonstrate that combining these approaches can reduce person re-identification accuracy to below 2% while maintaining 82% of retrieval performance for non-person objects. However, we identify critical vulnerabilities in these mitigations, including potential circumvention using partial person images. These findings highlight urgent regulatory questions at the intersection of AI governance and data protection: How should we classify and regulate systems with emergent identification capabilities? And what technical standards should be required to prevent identification capabilities from developing in seemingly benign applications?

Emergent AI Surveillance: Overlearned Person Re-Identification and Its Mitigation in Law Enforcement Context

TL;DR

This work demonstrates that generic instance search models trained on non-human data can develop emergent person re-identification capabilities, posing privacy and governance challenges. It introduces two mitigations—index exclusion and confusion loss—and shows that, in combination, they can reduce person re-ID accuracy to below 2% while preserving non-person retrieval substantially. However, the defenses remain vulnerable to circumvention using partial-person queries and accessories, underscoring weaknesses in current safeguards. The study highlights urgent regulatory questions about classifying and controlling systems with emergent identification capabilities and the need for robust technical standards to prevent function creep in AI-enabled surveillance.

Abstract

Generic instance search models can dramatically reduce the manual effort required to analyze vast surveillance footage during criminal investigations by retrieving specific objects of interest to law enforcement. However, our research reveals an unintended emergent capability: through overlearning, these models can single out specific individuals even when trained on datasets without human subjects. This capability raises concerns regarding identification and profiling of individuals based on their personal data, while there is currently no clear standard on how de-identification can be achieved. We evaluate two technical safeguards to curtail a model's person re-identification capacity: index exclusion and confusion loss. Our experiments demonstrate that combining these approaches can reduce person re-identification accuracy to below 2% while maintaining 82% of retrieval performance for non-person objects. However, we identify critical vulnerabilities in these mitigations, including potential circumvention using partial person images. These findings highlight urgent regulatory questions at the intersection of AI governance and data protection: How should we classify and regulate systems with emergent identification capabilities? And what technical standards should be required to prevent identification capabilities from developing in seemingly benign applications?

Paper Structure

This paper contains 29 sections, 4 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: Instance search pipeline with mitigation strategies, adapted from 10.1145/3617233.3617249.
  • Figure 2: Examples where the confusion-loss model fails to prevent re-identification. The query appears in the left-most column; green borders denote correct matches. The visible faces are blurred for this figure.
  • Figure 3: UMAP visualization comparing YouTube-VIS embeddings from models trained with (a) standard MS loss on non-human subjects and (b) confusion loss. Colors represent different individuals; confusion loss effectively disperses person embeddings, reducing re-identification.
  • Figure 4: AttnLRP relevance heatmaps comparing models trained with standard MS loss versus confusion loss. Confusion loss reduces the relevance of identifiable person features, while preserving relevance for general object recognition.