Detecting Object Tracking Failure via Sequential Hypothesis Testing

Alejandro Monroy Muñoz; Rajeev Verma; Alexander Timans

Detecting Object Tracking Failure via Sequential Hypothesis Testing

Alejandro Monroy Muñoz, Rajeev Verma, Alexander Timans

TL;DR

This work tackles the lack of formal safety guarantees in real-time online object tracking by introducing a sequential hypothesis-testing framework based on e-processes to detect tracking failures with anytime-valid Type-I error control. It formulates tracking failure as a test between $H_0: \mathbb{E}_{P_t}[M_t|\mathcal{F}_{t-1}] \ge \epsilon$ for all $t$ and $H_1: \exists t$ with $\mathbb{E}_{P_t}[M_t|\mathcal{F}_{t-1}] < \epsilon$, and monitors an e-process $X_t = \prod_{i=1}^t [1 + \lambda_i(\epsilon - M_i)]$ that triggers an alert when $X_t \ge \frac{1}{\alpha}$, guaranteed by Ville's inequality. The framework supports both supervised and unsupervised tracking-quality signals and offers two betting-rate strategies, aGRAPA and SF-OGD, to maximize growth under failure. Empirically, it demonstrates controlled false-alarm rates (FPRs at or below the chosen level) and low detection delays across two tracker families (KCF, SiamFC) and four benchmarks, with model-agnostic and lightweight deployment. This sequential safety layer enables deployed trackers to monitor reliability in real time without retraining, providing a principled path toward safer autonomous vision systems.

Abstract

Real-time online object tracking in videos constitutes a core task in computer vision, with wide-ranging applications including video surveillance, motion capture, and robotics. Deployed tracking systems usually lack formal safety assurances to convey when tracking is reliable and when it may fail, at best relying on heuristic measures of model confidence to raise alerts. To obtain such assurances we propose interpreting object tracking as a sequential hypothesis test, wherein evidence for or against tracking failures is gradually accumulated over time. Leveraging recent advancements in the field, our sequential test (formalized as an e-process) quickly identifies when tracking failures set in whilst provably containing false alerts at a desired rate, and thus limiting potentially costly re-calibration or intervention steps. The approach is computationally light-weight, requires no extra training or fine-tuning, and is in principle model-agnostic. We propose both supervised and unsupervised variants by leveraging either ground-truth or solely internal tracking information, and demonstrate its effectiveness for two established tracking models across four video benchmarks. As such, sequential testing can offer a statistically grounded and efficient mechanism to incorporate safety assurances into real-time tracking systems.

Detecting Object Tracking Failure via Sequential Hypothesis Testing

TL;DR

for all

and

with

, and monitors an e-process

that triggers an alert when

, guaranteed by Ville's inequality. The framework supports both supervised and unsupervised tracking-quality signals and offers two betting-rate strategies, aGRAPA and SF-OGD, to maximize growth under failure. Empirically, it demonstrates controlled false-alarm rates (FPRs at or below the chosen level) and low detection delays across two tracker families (KCF, SiamFC) and four benchmarks, with model-agnostic and lightweight deployment. This sequential safety layer enables deployed trackers to monitor reliability in real time without retraining, providing a principled path toward safer autonomous vision systems.

Abstract

Paper Structure (34 sections, 1 theorem, 39 equations, 6 figures, 3 tables)

This paper contains 34 sections, 1 theorem, 39 equations, 6 figures, 3 tables.

Introduction
Related Work
Object tracking models.
Reliable failure detection.
Detecting Tracking Failures via Testing
Problem and Method Formulation
Measuring Tracking Quality ($M_t$)
Supervised setting.
Unsupervised setting.
Learning the Betting Rate ($\lambda_t$)
Approximate GRAPA (aGRAPA).
Scale-Free Online Gradient Descent (SF-OGD).
Experimental Design
Datasets and tracking models.
Recency and window size.
...and 19 more sections

Key Result

Proposition A.13

(Ville's inequality) If $\{W_t\}_{t \in \mathcal{T}}$ is an E-process for a null $\mathcal{P}$, then for any $\alpha \geq 1$:

Figures (6)

Figure 1: Conceptual example of a sequential test to detect object tracking failure. The tracker initially correctly follows the target object, but subsequently re-focuses on a wrong object due to occlusions. The test (e-process) mirrors the behaviour by remaining initially stable, but quickly grows and exceeds the signal threshold ($\frac{1}{\alpha}$) once tracking failure is determined, triggering an alert.
Figure 2: A schematic of the proposed sequential testing framework for object tracking. At any given time step $t$ the current frame $Y_t$ is fed to the tracker, whose produced prediction or response map (and potential ground truth information) inform a metric $M_t$ capturing object tracking quality. From it, evidence for or against tracking failure is added to a running measure of evidence (the e-process $X_t$, \ref{['eq:e-process']}) whose growth either remains stable (when tracking is satisfactory) or eventually triggers a failure alert (when exceeding a threshold $\frac{1}{\alpha}$).
Figure 3: An example of effective tracking failure detection for different tracking quality metrics $M_t$ (top) and corresponding e-processes $X_t$ (bottom) leveraging both aGRAPA and SF-OGD betting rates (\ref{['subsec:method-bettingrate']}). The video sample (Car-1, OTB-100) depicts vehicle tracking in a traffic sequence, with subsequent tracking drift caused by a motorcycle occlusion. Failure alerts are raised consistently across metrics, with supervised (NGIoU) reacting later but more stably, while unsupervised ones tend to more volatility.
Figure 4: Histograms depicting the distribution of detection delays across datasets and both aGRAPA and SF-OGD betting rates (\ref{['subsec:method-bettingrate']}), for an e-process with supervised NGIoU metric (\ref{['subsec:method-tracking-metrics']}). False positives (premature alerts) are marked red and denote the small and contained fraction of negative detection delays. Overall, detection delays are relatively small and concentrate close to zero, indicating quick and reactive failure alerts by sequential testing.
Figure 5: An example of partially unsuccessful tracking failure detection, for different tracking quality metrics $M_t$ (top) and corresponding e-processes $X_t$ (bottom) leveraging both aGRAPA and SF-OGD betting rates (\ref{['subsec:method-bettingrate']}). The video sample (Sheep-1, LaSOT) depicts animal tracking in a natural environment, with subsequent tracking drift caused by rapid object movements. While clearly identified by the supervised signal (NGIoU), unsupervised metrics struggle more with raising a proper alert, peak correlation failing to do so entirely.
...and 1 more figures

Theorems & Definitions (13)

Example A.1
Definition A.2
Definition A.3
Definition A.4
Example A.5
Definition A.6
Example A.7
Definition A.8
Definition A.9
Definition A.10
...and 3 more

Detecting Object Tracking Failure via Sequential Hypothesis Testing

TL;DR

Abstract

Detecting Object Tracking Failure via Sequential Hypothesis Testing

Authors

TL;DR

Abstract

Table of Contents

Key Result

Figures (6)

Theorems & Definitions (13)