Table of Contents
Fetching ...

Causality and "In-the-Wild" Video-Based Person Re-ID: A Survey

Md Rashidunnabi, Kailash Hambarde, Hugo Proença

TL;DR

This survey argues that causality is essential for robust video-based person Re-ID in the wild, where correlation-based models fail under domain shifts and appearance changes. It reviews Structural Causal Models, counterfactual reasoning, and interventional training, presenting a taxonomy that includes causal disentanglement and causal transformers. It analyzes cutting-edge models such as DIR-ReID, IS-GAN, and UCT, and proposes causal-specific robustness metrics, while also discussing challenges like scalability, fairness, privacy, and interpretability. The paper outlines future directions that combine causal modeling with efficient architectures and self-supervised learning to enable accurate, fair, and privacy-preserving Re-ID in real deployments.

Abstract

Video-based person re-identification (Re-ID) remains brittle in real-world deployments despite impressive benchmark performance. Most existing models rely on superficial correlations such as clothing, background, or lighting that fail to generalize across domains, viewpoints, and temporal variations. This survey examines the emerging role of causal reasoning as a principled alternative to traditional correlation-based approaches in video-based Re-ID. We provide a structured and critical analysis of methods that leverage structural causal models, interventions, and counterfactual reasoning to isolate identity-specific features from confounding factors. The survey is organized around a novel taxonomy of causal Re-ID methods that spans generative disentanglement, domain-invariant modeling, and causal transformers. We review current evaluation metrics and introduce causal-specific robustness measures. In addition, we assess practical challenges of scalability, fairness, interpretability, and privacy that must be addressed for real-world adoption. Finally, we identify open problems and outline future research directions that integrate causal modeling with efficient architectures and self-supervised learning. This survey aims to establish a coherent foundation for causal video-based person Re-ID and to catalyze the next phase of research in this rapidly evolving domain.

Causality and "In-the-Wild" Video-Based Person Re-ID: A Survey

TL;DR

This survey argues that causality is essential for robust video-based person Re-ID in the wild, where correlation-based models fail under domain shifts and appearance changes. It reviews Structural Causal Models, counterfactual reasoning, and interventional training, presenting a taxonomy that includes causal disentanglement and causal transformers. It analyzes cutting-edge models such as DIR-ReID, IS-GAN, and UCT, and proposes causal-specific robustness metrics, while also discussing challenges like scalability, fairness, privacy, and interpretability. The paper outlines future directions that combine causal modeling with efficient architectures and self-supervised learning to enable accurate, fair, and privacy-preserving Re-ID in real deployments.

Abstract

Video-based person re-identification (Re-ID) remains brittle in real-world deployments despite impressive benchmark performance. Most existing models rely on superficial correlations such as clothing, background, or lighting that fail to generalize across domains, viewpoints, and temporal variations. This survey examines the emerging role of causal reasoning as a principled alternative to traditional correlation-based approaches in video-based Re-ID. We provide a structured and critical analysis of methods that leverage structural causal models, interventions, and counterfactual reasoning to isolate identity-specific features from confounding factors. The survey is organized around a novel taxonomy of causal Re-ID methods that spans generative disentanglement, domain-invariant modeling, and causal transformers. We review current evaluation metrics and introduce causal-specific robustness measures. In addition, we assess practical challenges of scalability, fairness, interpretability, and privacy that must be addressed for real-world adoption. Finally, we identify open problems and outline future research directions that integrate causal modeling with efficient architectures and self-supervised learning. This survey aims to establish a coherent foundation for causal video-based person Re-ID and to catalyze the next phase of research in this rapidly evolving domain.

Paper Structure

This paper contains 25 sections, 2 equations, 9 figures, 4 tables.

Figures (9)

  • Figure S1: Why video-based person Re-ID is hard. The same individual appears under six nuisance factors—viewpoint, lighting, rain blur, pose, clothing change, and accessory occlusion—illustrating the need for causal disentanglement rather than correlation-driven learning.
  • Figure S2: Benefits of Causal Disentanglement in Video-Based Person Re-ID. Causal reasoning improves cross-domain robustness, occlusion resilience, fairness, privacy, and interpretability—key for real-world Re-ID systems.
  • Figure S3: Traditional video-based person Re-ID pipeline. The diagram summarises classical modules—frame-level CNN, temporal modelling (RNN / 3-D CNN), pooling–attention, generative augmentation, and domain-invariant learning— that transform a tracklet into a fixed-length identity embedding.
  • Figure S4: Correlation versus causation in Re-ID. This figure contrasts a correlation based model whose heatmap (top right) overwhelmingly highlights the backpack and surrounding background with a causation based model whose heatmap (bottom right) instead focuses on the upper back neck and head posture as true identity intrinsic features. The illustrative violin plot shows 32% versus 8% median background overlap not as experimental values but to emphasize how causal training de-emphasizes spurious context. As discussed in the papers below this shift yields more robust and generalizable person re identification performance than correlation only approaches.
  • Figure S5: Comparing Correlation vs. Causation in Re-ID. The left side shows the traditional correlation-based approach where confounders ($Z$) can create shortcuts between appearance ($X$) and prediction ($Y$), leading to spurious correlations. The model is trained on observed data $P(Y|X)$, making it vulnerable to changes in the distribution. The right side illustrates the causal/interventional approach that blocks the backdoor path from confounding variables through intervention $do(X)$. By targeting $P(Y|do(X))$, the causal model focuses on the direct effect of identity ($Y$) on appearance ($X$), resulting in more robust predictions under varying conditions.
  • ...and 4 more figures