Detecting Object Tracking Failure via Sequential Hypothesis Testing
Alejandro Monroy Muñoz, Rajeev Verma, Alexander Timans
TL;DR
This work tackles the lack of formal safety guarantees in real-time online object tracking by introducing a sequential hypothesis-testing framework based on e-processes to detect tracking failures with anytime-valid Type-I error control. It formulates tracking failure as a test between $H_0: \mathbb{E}_{P_t}[M_t|\mathcal{F}_{t-1}] \ge \epsilon$ for all $t$ and $H_1: \exists t$ with $\mathbb{E}_{P_t}[M_t|\mathcal{F}_{t-1}] < \epsilon$, and monitors an e-process $X_t = \prod_{i=1}^t [1 + \lambda_i(\epsilon - M_i)]$ that triggers an alert when $X_t \ge \frac{1}{\alpha}$, guaranteed by Ville's inequality. The framework supports both supervised and unsupervised tracking-quality signals and offers two betting-rate strategies, aGRAPA and SF-OGD, to maximize growth under failure. Empirically, it demonstrates controlled false-alarm rates (FPRs at or below the chosen level) and low detection delays across two tracker families (KCF, SiamFC) and four benchmarks, with model-agnostic and lightweight deployment. This sequential safety layer enables deployed trackers to monitor reliability in real time without retraining, providing a principled path toward safer autonomous vision systems.
Abstract
Real-time online object tracking in videos constitutes a core task in computer vision, with wide-ranging applications including video surveillance, motion capture, and robotics. Deployed tracking systems usually lack formal safety assurances to convey when tracking is reliable and when it may fail, at best relying on heuristic measures of model confidence to raise alerts. To obtain such assurances we propose interpreting object tracking as a sequential hypothesis test, wherein evidence for or against tracking failures is gradually accumulated over time. Leveraging recent advancements in the field, our sequential test (formalized as an e-process) quickly identifies when tracking failures set in whilst provably containing false alerts at a desired rate, and thus limiting potentially costly re-calibration or intervention steps. The approach is computationally light-weight, requires no extra training or fine-tuning, and is in principle model-agnostic. We propose both supervised and unsupervised variants by leveraging either ground-truth or solely internal tracking information, and demonstrate its effectiveness for two established tracking models across four video benchmarks. As such, sequential testing can offer a statistically grounded and efficient mechanism to incorporate safety assurances into real-time tracking systems.
