Table of Contents
Fetching ...

DART$^3$: Leveraging Distance for Test Time Adaptation in Person Re-Identification

Rajarshi Bhattacharya, Shakeeb Murtaza, Christian Desrosiers, Jose Dolz, Maguelonne Heritier, Eric Granger

TL;DR

DART3 tackles camera bias-induced domain shifts in person ReID by replacing entropy-based test-time objectives with a distance-based retrieval objective tailored to metric learning. It introduces lightweight, camera-conditioned external scale and shift parameters that adjust embeddings without retraining the source model, enabling black-box or hybrid deployment and initialization from per-camera statistics. Across multiple datasets and backbones, DART3 and its LITE variant consistently outperform state-of-the-art TTA baselines, particularly for unseen cameras, demonstrating practical online adaptation for expanding camera networks. This approach offers a scalable, source-free solution for robust ReID under real-world camera biases, with notable reductions in parameter count and inference overhead in the LITE version.

Abstract

Person re-identification (ReID) models are known to suffer from camera bias, where learned representations cluster according to camera viewpoints rather than identity, leading to significant performance degradation under (inter-camera) domain shifts in real-world surveillance systems when new cameras are added to camera networks. State-of-the-art test-time adaptation (TTA) methods, largely designed for classification tasks, rely on classification entropy-based objectives that fail to generalize well to ReID, thus making them unsuitable for tackling camera bias. In this paper, we introduce DART$^3$, a TTA framework specifically designed to mitigate camera-induced domain shifts in person ReID. DART$^3$ (Distance-Aware Retrieval Tuning at Test Time) leverages a distance-based objective that aligns better with image retrieval tasks like ReID by exploiting the correlation between nearest-neighbor distance and prediction error. Unlike prior ReID-specific domain adaptation methods, DART$^3$ requires no source data, architectural modifications, or retraining, and can be deployed in both fully black-box and hybrid settings. Empirical evaluations on multiple ReID benchmarks indicate that DART$^3$ and DART$^3$ LITE, a lightweight alternative to the approach, consistently outperforms state-of-the-art TTA baselines, making for a viable option to online learning to mitigate the adverse effects of camera bias.

DART$^3$: Leveraging Distance for Test Time Adaptation in Person Re-Identification

TL;DR

DART3 tackles camera bias-induced domain shifts in person ReID by replacing entropy-based test-time objectives with a distance-based retrieval objective tailored to metric learning. It introduces lightweight, camera-conditioned external scale and shift parameters that adjust embeddings without retraining the source model, enabling black-box or hybrid deployment and initialization from per-camera statistics. Across multiple datasets and backbones, DART3 and its LITE variant consistently outperform state-of-the-art TTA baselines, particularly for unseen cameras, demonstrating practical online adaptation for expanding camera networks. This approach offers a scalable, source-free solution for robust ReID under real-world camera biases, with notable reductions in parameter count and inference overhead in the LITE version.

Abstract

Person re-identification (ReID) models are known to suffer from camera bias, where learned representations cluster according to camera viewpoints rather than identity, leading to significant performance degradation under (inter-camera) domain shifts in real-world surveillance systems when new cameras are added to camera networks. State-of-the-art test-time adaptation (TTA) methods, largely designed for classification tasks, rely on classification entropy-based objectives that fail to generalize well to ReID, thus making them unsuitable for tackling camera bias. In this paper, we introduce DART, a TTA framework specifically designed to mitigate camera-induced domain shifts in person ReID. DART (Distance-Aware Retrieval Tuning at Test Time) leverages a distance-based objective that aligns better with image retrieval tasks like ReID by exploiting the correlation between nearest-neighbor distance and prediction error. Unlike prior ReID-specific domain adaptation methods, DART requires no source data, architectural modifications, or retraining, and can be deployed in both fully black-box and hybrid settings. Empirical evaluations on multiple ReID benchmarks indicate that DART and DART LITE, a lightweight alternative to the approach, consistently outperforms state-of-the-art TTA baselines, making for a viable option to online learning to mitigate the adverse effects of camera bias.

Paper Structure

This paper contains 27 sections, 17 equations, 11 figures, 7 tables, 2 algorithms.

Figures (11)

  • Figure 1: A failure case where the source model misidentifies a query image due to camera bias—favoring views with similar pose, lighting, and perspective. (a)–(e) show t-SNE visualizations of the top feature matches for different test-time adaptation methods. Unseen camera (a) refers to a query sample taken from a camera not used for training the source model. TENT wang2020tent (b) over-condenses erroneous clusters; TEMP adachi2024test (c) aligns features angularly but still misidentifies; Camera Normalization song2025exploring (d) reduces bias towards background artifacts yet fails to recover identity. Our method (e) achieves correct retrieval despite view and lighting differences. (f) shows the top predictions for each adaptation method.
  • Figure 2: Change in error rate vs nearest Euclidean and Cosine distance, and entropy, over a range of values. The error rate for Euclidean distance grows more uniformly over the range. In contrast, cosine distance grows abruptly for a high value, and with entropy the error rate behaves unpredictably
  • Figure 3: Normalized Mutual Information (NMI) between feature clusters and Camera IDs. NMI serves as a measure of camera bias and our method demonstrates the lowest scores compared to existing methodologies.
  • Figure 4: We present DART3, a test time adaptation pipeline designed to mitigate camera bias in ReID models when exposed to unseen camera domains. We hypothesize that a true unbiased representation $\mathbf{z}^*_i$ exists and can be estimated by scale and shift parameters. For the adaptation, these parameters can be initialized as the mean and standard deviation of features accumulated for a specific camera. Finally we show two variants of our method, DART3 and DART3 LITE, based on whether we treat the source model as a black-box entity.
  • Figure 5: Trends in performance (mAP) with respect to the value of $k$ and $\tau$ (a), and number of steps of optimization per batch for (b) non-episodic training and (c) episodic training. (d) Comparison of our method with grounding samples in a batch.
  • ...and 6 more figures