Table of Contents
Fetching ...

CityGuard: Graph-Aware Private Descriptors for Bias-Resilient Identity Search Across Urban Cameras

Rong Fu, Wenxin Zhang, Yibo Meng, Jia Yee Tan, Jiaxuan Lu, Rui Lu, Jiekai Wu, Zhaolu Kang, Simon Fong

TL;DR

CityGuard is introduced, a topology-aware transformer for privacy-preserving identity retrieval in decentralized surveillance that produces descriptors robust to viewpoint variation, occlusion, and domain shifts and enables a tunable balance between privacy and utility under rigorous differential-privacy accounting.

Abstract

City-scale person re-identification across distributed cameras must handle severe appearance changes from viewpoint, occlusion, and domain shift while complying with data protection rules that prevent sharing raw imagery. We introduce CityGuard, a topology-aware transformer for privacy-preserving identity retrieval in decentralized surveillance. The framework integrates three components. A dispersion-adaptive metric learner adjusts instance-level margins according to feature spread, increasing intra-class compactness. Spatially conditioned attention injects coarse geometry, such as GPS or deployment floor plans, into graph-based self-attention to enable projectively consistent cross-view alignment using only coarse geometric priors without requiring survey-grade calibration. Differentially private embedding maps are coupled with compact approximate indexes to support secure and cost-efficient deployment. Together these designs produce descriptors robust to viewpoint variation, occlusion, and domain shifts, and they enable a tunable balance between privacy and utility under rigorous differential-privacy accounting. Experiments on Market-1501 and additional public benchmarks, complemented by database-scale retrieval studies, show consistent gains in retrieval precision and query throughput over strong baselines, confirming the practicality of the framework for privacy-critical urban identity matching.

CityGuard: Graph-Aware Private Descriptors for Bias-Resilient Identity Search Across Urban Cameras

TL;DR

CityGuard is introduced, a topology-aware transformer for privacy-preserving identity retrieval in decentralized surveillance that produces descriptors robust to viewpoint variation, occlusion, and domain shifts and enables a tunable balance between privacy and utility under rigorous differential-privacy accounting.

Abstract

City-scale person re-identification across distributed cameras must handle severe appearance changes from viewpoint, occlusion, and domain shift while complying with data protection rules that prevent sharing raw imagery. We introduce CityGuard, a topology-aware transformer for privacy-preserving identity retrieval in decentralized surveillance. The framework integrates three components. A dispersion-adaptive metric learner adjusts instance-level margins according to feature spread, increasing intra-class compactness. Spatially conditioned attention injects coarse geometry, such as GPS or deployment floor plans, into graph-based self-attention to enable projectively consistent cross-view alignment using only coarse geometric priors without requiring survey-grade calibration. Differentially private embedding maps are coupled with compact approximate indexes to support secure and cost-efficient deployment. Together these designs produce descriptors robust to viewpoint variation, occlusion, and domain shifts, and they enable a tunable balance between privacy and utility under rigorous differential-privacy accounting. Experiments on Market-1501 and additional public benchmarks, complemented by database-scale retrieval studies, show consistent gains in retrieval precision and query throughput over strong baselines, confirming the practicality of the framework for privacy-critical urban identity matching.
Paper Structure (66 sections, 9 theorems, 93 equations, 12 figures, 13 tables, 1 algorithm)

This paper contains 66 sections, 9 theorems, 93 equations, 12 figures, 13 tables, 1 algorithm.

Key Result

Theorem A.1

Assume $\ell\in[0,1]$ and let $S$ be an i.i.d. sample of size $n$ drawn from $P_i$. For any prior $Q$ independent of $S$ and any posterior $P$, with probability at least $1-\delta$ over the draw of $S$, where $R(P)$ is the expected population risk under posterior $P$, where $\widehat{R}_n(P)$ is the empirical risk on $S$, where $n$ is the sample size, where $\delta\in(0,1)$ is a confidence parame

Figures (12)

  • Figure 1: Overview of the CityGuard framework for bias-resilient, privacy-preserving identity search. The process begins with Topology-Aware Geometry Encoding, where camera coordinates and rotations are mapped to a spatial adjacency graph. The Geometry-Conditioned Backbone then fuses multi-scale features and refines them through a Temporal Graph Network (TGN) to capture cross-camera motion cues. Centrally, the Dispersion-Aware Metric Calibration utilizes an Adaptive Class-Tolerant (ACT) Loss to dynamically adjust margins $\gamma_i$ based on per-identity distribution divergence. Global consistency is enforced via a Transport-Regularized Retrieval objective using Sinkhorn iterations. Finally, the Differentially Private Embedding Release applies a Gaussian mechanism calibrated by $L_2$-sensitivity to produce privatized descriptors $\widetilde{f}$ for secure, large-scale indexing.
  • Figure 2: Camera topology (GPS only): top-down 2D layout of camera nodes with edge thickness encoding the row-stochastic affinity $A_{ij}$. This panel visualizes affinity derived solely from pairwise GPS distances.
  • Figure 3: Camera topology (GPS + Rotation): top-down 2D layout where $A_{ij}$ incorporates both GPS distance and heading alignment. Rotation-aware alignment increases weights between cameras with similar headings.
  • Figure 4: Attention matrix $A$ computed from visual similarity alone (no geometric bias). Rows correspond to source cameras and columns to target cameras; intensity indicates attention weight before incorporation of the geometric term $B_{\mathrm{geom}}$.
  • Figure 5: Attention matrix $A$ after adding geometric bias $B_{\mathrm{geom}}$. Compared with Figure \ref{['fig:attention_no_geom']}, physical neighbors and heading-aligned cameras receive visibly higher attention weights, illustrating the effect of the geometry-conditioned term.
  • ...and 7 more figures

Theorems & Definitions (18)

  • Theorem A.1: Compact PAC-Bayes bound
  • proof
  • Lemma A.2: Margin bounds and Lipschitz continuity of logits
  • proof
  • Proposition A.3: Feature-norm control under weight decay
  • proof
  • Proposition A.4: Spectral radius of the row-stochastic attention matrix
  • proof
  • Proposition A.5: Non-amplification of message passing
  • proof
  • ...and 8 more