Table of Contents
Fetching ...

Detector-in-the-Loop Tracking: Active Memory Rectification for Stable Glottic Opening Localization

Huayu Wang, Bahaa Alattar, Cheng-Yen Yang, Hsiang-Wei Huang, Jung Heon Kim, Linda Shapiro, Nathan White, Jenq-Neng Hwang

TL;DR

Problem: unstable glottic localization in video laryngoscopy due to lack of temporal context and tracker memory drift. Approach: Closed-Loop Memory Correction (CL-MC) forms a bidirectional loop between a single-frame detector and SAM2, employing a state-machine and memory rectification to actively reset tracker memory when drift is detected. Contributions: heterogeneous confidence alignment, state-machine driven prediction selection, and representation-level memory rectification without retraining. Results: on Harborview emergency intubation videos, CL-MC achieves state-of-the-art metrics with higher AUC and lower missing rates than baselines, demonstrating robust temporal stability in challenging clinical scenes. Significance: provides a training-free, generalizable mechanism to stabilize medical video tracking under severe domain shift and artifacts, with potential extension to multi-object tracking and language-conditioned priors.

Abstract

Temporal stability in glottic opening localization remains challenging due to the complementary weaknesses of single-frame detectors and foundation-model trackers: the former lacks temporal context, while the latter suffers from memory drift. Specifically, in video laryngoscopy, rapid tissue deformation, occlusions, and visual ambiguities in emergency settings require a robust, temporally aware solution that can prevent progressive tracking errors. We propose Closed-Loop Memory Correction (CL-MC), a detector-in-the-loop framework that supervises Segment Anything Model 2(SAM2) through confidence-aligned state decisions and active memory rectification. High-confidence detections trigger semantic resets that overwrite corrupted tracker memory, effectively mitigating drift accumulation with a training-free foundation tracker in complex endoscopic scenes. On emergency intubation videos, CL-MC achieves state-of-the-art performance, significantly reducing drift and missing rate compared with the SAM2 variants and open loop based methods. Our results establish memory correction as a crucial component for reliable clinical video tracking. Our code will be available in https://github.com/huayuww/CL-MR.

Detector-in-the-Loop Tracking: Active Memory Rectification for Stable Glottic Opening Localization

TL;DR

Problem: unstable glottic localization in video laryngoscopy due to lack of temporal context and tracker memory drift. Approach: Closed-Loop Memory Correction (CL-MC) forms a bidirectional loop between a single-frame detector and SAM2, employing a state-machine and memory rectification to actively reset tracker memory when drift is detected. Contributions: heterogeneous confidence alignment, state-machine driven prediction selection, and representation-level memory rectification without retraining. Results: on Harborview emergency intubation videos, CL-MC achieves state-of-the-art metrics with higher AUC and lower missing rates than baselines, demonstrating robust temporal stability in challenging clinical scenes. Significance: provides a training-free, generalizable mechanism to stabilize medical video tracking under severe domain shift and artifacts, with potential extension to multi-object tracking and language-conditioned priors.

Abstract

Temporal stability in glottic opening localization remains challenging due to the complementary weaknesses of single-frame detectors and foundation-model trackers: the former lacks temporal context, while the latter suffers from memory drift. Specifically, in video laryngoscopy, rapid tissue deformation, occlusions, and visual ambiguities in emergency settings require a robust, temporally aware solution that can prevent progressive tracking errors. We propose Closed-Loop Memory Correction (CL-MC), a detector-in-the-loop framework that supervises Segment Anything Model 2(SAM2) through confidence-aligned state decisions and active memory rectification. High-confidence detections trigger semantic resets that overwrite corrupted tracker memory, effectively mitigating drift accumulation with a training-free foundation tracker in complex endoscopic scenes. On emergency intubation videos, CL-MC achieves state-of-the-art performance, significantly reducing drift and missing rate compared with the SAM2 variants and open loop based methods. Our results establish memory correction as a crucial component for reliable clinical video tracking. Our code will be available in https://github.com/huayuww/CL-MR.
Paper Structure (18 sections, 3 equations, 4 figures, 4 tables, 1 algorithm)

This paper contains 18 sections, 3 equations, 4 figures, 4 tables, 1 algorithm.

Figures (4)

  • Figure 1: Comparision between open loop and closed loop tracking
  • Figure 2: Upon high-confidence initialization ($\tau_{init}$), both branches process incoming frames $I_t$ to generate candidate bounding boxes and scores. The core component is the Memory Correction Module, which not only integrates predictions but also drives a Memory Rectification loop. Unlike passive FIFO updates, this state actively utilizes high-confidence fusion results to reset or refresh the SAM2 Memory Bank, thereby preventing semantic drift and memory contamination in long sequences.
  • Figure 3: Detection training utilized Laryngoscope8 and YouTube image dataset, while tracking performance are evaluated on the private 26-video dataset, Harborview Dataset.
  • Figure 4: Visualization of Qualitative Results. Blue and green boxes denote the Ours and YOLO outputs, respectively.