Table of Contents
Fetching ...

Deep Learning-based Lightweight RGB Object Tracking for Augmented Reality Devices

Alice Smith, Bob Johnson, Xiaoyu Zhu, Carol Lee

TL;DR

The paper tackles real-time on-device object tracking for AR under tight compute and memory constraints. It introduces a compact Siamese RGB tracker with a MobileNet-V2 backbone, augmented by pruning, quantization, and knowledge distillation to preserve accuracy while dramatically reducing model size and latency. Empirical results on standard benchmarks and an AR platform show the approach achieving around $\sim 30$ FPS with accuracy competitive to larger trackers, and a small on-device footprint suitable for wearable devices. This work enables robust, interactive AR experiences on lightweight devices by bridging high-accuracy tracking and hardware-aware efficiency.

Abstract

Augmented Reality (AR) applications often require robust real-time tracking of objects in the user's environment to correctly overlay virtual content. Recent advances in computer vision have produced highly accurate deep learning-based object trackers, but these models are typically too heavy in computation and memory for wearable AR devices. In this paper, we present a lightweight RGB object tracking algorithm designed specifically for resource-constrained AR platforms. The proposed tracker employs a compact Siamese neural network architecture and incorporates optimization techniques such as model pruning, quantization, and knowledge distillation to drastically reduce model size and inference cost while maintaining high tracking accuracy. We train the tracker offline on large video datasets using deep convolutional neural networks and then deploy it on-device for real-time tracking. Experimental results on standard tracking benchmarks show that our approach achieves comparable accuracy to state-of-the-art trackers, yet runs in real-time on a mobile AR headset at around 30 FPS -- more than an order of magnitude faster than prior high-performance trackers on the same hardware. This work enables practical, robust object tracking for AR use-cases, opening the door to more interactive and dynamic AR experiences on lightweight devices.

Deep Learning-based Lightweight RGB Object Tracking for Augmented Reality Devices

TL;DR

The paper tackles real-time on-device object tracking for AR under tight compute and memory constraints. It introduces a compact Siamese RGB tracker with a MobileNet-V2 backbone, augmented by pruning, quantization, and knowledge distillation to preserve accuracy while dramatically reducing model size and latency. Empirical results on standard benchmarks and an AR platform show the approach achieving around FPS with accuracy competitive to larger trackers, and a small on-device footprint suitable for wearable devices. This work enables robust, interactive AR experiences on lightweight devices by bridging high-accuracy tracking and hardware-aware efficiency.

Abstract

Augmented Reality (AR) applications often require robust real-time tracking of objects in the user's environment to correctly overlay virtual content. Recent advances in computer vision have produced highly accurate deep learning-based object trackers, but these models are typically too heavy in computation and memory for wearable AR devices. In this paper, we present a lightweight RGB object tracking algorithm designed specifically for resource-constrained AR platforms. The proposed tracker employs a compact Siamese neural network architecture and incorporates optimization techniques such as model pruning, quantization, and knowledge distillation to drastically reduce model size and inference cost while maintaining high tracking accuracy. We train the tracker offline on large video datasets using deep convolutional neural networks and then deploy it on-device for real-time tracking. Experimental results on standard tracking benchmarks show that our approach achieves comparable accuracy to state-of-the-art trackers, yet runs in real-time on a mobile AR headset at around 30 FPS -- more than an order of magnitude faster than prior high-performance trackers on the same hardware. This work enables practical, robust object tracking for AR use-cases, opening the door to more interactive and dynamic AR experiences on lightweight devices.

Paper Structure

This paper contains 14 sections, 1 equation, 2 figures.

Figures (2)

  • Figure 1: Architecture of the proposed lightweight Siamese tracker. A shared CNN backbone (MobileNet-V2 based) extracts features from the target template (left) and the search region (right). These feature maps are cross-correlated to produce a response map, which is fed into small classification and regression heads to predict the target's presence and bounding box in the search region.
  • Figure 2: Comparison of tracking speed (frames per second) for different trackers on a mobile AR device (Snapdragon 845, Adreno 630 GPU). Our proposed tracker runs an order of magnitude faster than heavy trackers like SiamRPN++ or Ocean, and achieves performance comparable to LightTrack, enabling real-time tracking in AR.