Table of Contents
Fetching ...

EdgeDAM: Real-time Object Tracking for Mobile Devices

Syed Muhammad Raza, Syed Murtaza Hussain Abidi, Khawar Islam, Muhammad Ibrahim, Ajmal Saeed Mian

TL;DR

EdgeDAM is a lightweight detection-guided tracking framework that reformulates distractor-aware memory for bounding-box tracking under strict edge constraints and introduces two key strategies: Dual-Buffer Distractor-Aware Memory and Confidence-Driven Switching with Held-Box Stabilization.

Abstract

Single-object tracking (SOT) on edge devices is a critical computer vision task, requiring accurate and continuous target localization across video frames under occlusion, distractor interference, and fast motion. However, recent state-of-the-art distractor-aware memory mechanisms are largely built on segmentation-based trackers and rely on mask prediction and attention-driven memory updates, which introduce substantial computational overhead and limit real-time deployment on resource-constrained hardware; meanwhile, lightweight trackers sustain high throughput but are prone to drift when visually similar distractors appear. To address these challenges, we propose EdgeDAM, a lightweight detection-guided tracking framework that reformulates distractor-aware memory for bounding-box tracking under strict edge constraints. EdgeDAM introduces two key strategies: (1) Dual-Buffer Distractor-Aware Memory (DAM), which integrates a Recent-Aware Memory to preserve temporally consistent target hypotheses and a Distractor-Resolving Memory to explicitly store hard negative candidates and penalize their re-selection during recovery; and (2) Confidence-Driven Switching with Held-Box Stabilization, where tracker reliability and temporal consistency criteria adaptively activate detection and memory-guided re-identification during occlusion, while a held-box mechanism temporarily freezes and expands the estimate to suppress distractor contamination. Extensive experiments on five benchmarks, including the distractor-focused DiDi dataset, demonstrate improved robustness under occlusion and fast motion while maintaining real-time performance on mobile devices, achieving 88.2% accuracy on DiDi and 25 FPS on an iPhone 15. Code will be released.

EdgeDAM: Real-time Object Tracking for Mobile Devices

TL;DR

EdgeDAM is a lightweight detection-guided tracking framework that reformulates distractor-aware memory for bounding-box tracking under strict edge constraints and introduces two key strategies: Dual-Buffer Distractor-Aware Memory and Confidence-Driven Switching with Held-Box Stabilization.

Abstract

Single-object tracking (SOT) on edge devices is a critical computer vision task, requiring accurate and continuous target localization across video frames under occlusion, distractor interference, and fast motion. However, recent state-of-the-art distractor-aware memory mechanisms are largely built on segmentation-based trackers and rely on mask prediction and attention-driven memory updates, which introduce substantial computational overhead and limit real-time deployment on resource-constrained hardware; meanwhile, lightweight trackers sustain high throughput but are prone to drift when visually similar distractors appear. To address these challenges, we propose EdgeDAM, a lightweight detection-guided tracking framework that reformulates distractor-aware memory for bounding-box tracking under strict edge constraints. EdgeDAM introduces two key strategies: (1) Dual-Buffer Distractor-Aware Memory (DAM), which integrates a Recent-Aware Memory to preserve temporally consistent target hypotheses and a Distractor-Resolving Memory to explicitly store hard negative candidates and penalize their re-selection during recovery; and (2) Confidence-Driven Switching with Held-Box Stabilization, where tracker reliability and temporal consistency criteria adaptively activate detection and memory-guided re-identification during occlusion, while a held-box mechanism temporarily freezes and expands the estimate to suppress distractor contamination. Extensive experiments on five benchmarks, including the distractor-focused DiDi dataset, demonstrate improved robustness under occlusion and fast motion while maintaining real-time performance on mobile devices, achieving 88.2% accuracy on DiDi and 25 FPS on an iPhone 15. Code will be released.
Paper Structure (20 sections, 14 equations, 3 figures, 9 tables)

This paper contains 20 sections, 14 equations, 3 figures, 9 tables.

Figures (3)

  • Figure 1: Accuracy--efficiency comparison of representative trackers. Robust memory-based designs improve occlusion handling but incur high computational cost, while lightweight trackers sustain real-time throughput yet remain vulnerable to distractors.
  • Figure 2: EdgeDAM framework overview. Input frames are pre-processed and passed through a YOLOv11s-based detection backbone comprising SPPF, C2PSA, and C3K2 modules. These detections initialize a CSRT tracker that propagates object trajectories across frames. Missed detections due to occlusion or ambiguity are redirected to the DAM module, which uses RAM and DRM buffers to perform memory-guided re-identification via spatial and distractor-aware filtering. Accepted proposals (green tick) enable recovery, while rejected ones (red cross) are discarded. A post-processing module refines the final bounding box of a recovered object.
  • Figure 3: Top two rows (dashed border) show real-time tracking performance of EdgeDAM under occlusion on a mobile device (iPhone 15). Bottom two rows (solid border) present EdgeDAM performance on standard SOTA benchmarks.