Causality and "In-the-Wild" Video-Based Person Re-ID: A Survey
Md Rashidunnabi, Kailash Hambarde, Hugo Proença
TL;DR
This survey argues that causality is essential for robust video-based person Re-ID in the wild, where correlation-based models fail under domain shifts and appearance changes. It reviews Structural Causal Models, counterfactual reasoning, and interventional training, presenting a taxonomy that includes causal disentanglement and causal transformers. It analyzes cutting-edge models such as DIR-ReID, IS-GAN, and UCT, and proposes causal-specific robustness metrics, while also discussing challenges like scalability, fairness, privacy, and interpretability. The paper outlines future directions that combine causal modeling with efficient architectures and self-supervised learning to enable accurate, fair, and privacy-preserving Re-ID in real deployments.
Abstract
Video-based person re-identification (Re-ID) remains brittle in real-world deployments despite impressive benchmark performance. Most existing models rely on superficial correlations such as clothing, background, or lighting that fail to generalize across domains, viewpoints, and temporal variations. This survey examines the emerging role of causal reasoning as a principled alternative to traditional correlation-based approaches in video-based Re-ID. We provide a structured and critical analysis of methods that leverage structural causal models, interventions, and counterfactual reasoning to isolate identity-specific features from confounding factors. The survey is organized around a novel taxonomy of causal Re-ID methods that spans generative disentanglement, domain-invariant modeling, and causal transformers. We review current evaluation metrics and introduce causal-specific robustness measures. In addition, we assess practical challenges of scalability, fairness, interpretability, and privacy that must be addressed for real-world adoption. Finally, we identify open problems and outline future research directions that integrate causal modeling with efficient architectures and self-supervised learning. This survey aims to establish a coherent foundation for causal video-based person Re-ID and to catalyze the next phase of research in this rapidly evolving domain.
