Table of Contents
Fetching ...

Multi-resolution Rescored ByteTrack for Video Object Detection on Ultra-low-power Embedded Systems

Luca Bompani, Manuele Rusci, Daniele Palossi, Francesco Conti, Luca Benini

TL;DR

The paper tackles video object detection on ultra-low-power MCUs by introducing MR2-ByteTrack, a pipeline that interleaves full-resolution frames with low-resolution frames, uses a Kalman-filter ByteTrack tracker, and applies a Rescore algorithm to improve class accuracy over time. By sharing weights across resolutions and operating on GAP9’s 9-core cluster, the approach achieves up to 2.25× compute reductions and 43% latency savings while maintaining or improving mAP, notably a 2.16% average gain on ImageNetVID. The method outperforms frame-by-frame baselines and remains competitive with transformer-based VOD approaches, without extra training or significant memory overhead, making high-accuracy VOD feasible on milliwatt-scale embedded devices. This work demonstrates a practical, training-free path to real-time VOD on extreme-edge hardware with clear benefits for surveillance, robotics, and mobile autonomous systems.

Abstract

This paper introduces Multi-Resolution Rescored Byte-Track (MR2-ByteTrack), a novel video object detection framework for ultra-low-power embedded processors. This method reduces the average compute load of an off-the-shelf Deep Neural Network (DNN) based object detector by up to 2.25$\times$ by alternating the processing of high-resolution images (320$\times$320 pixels) with multiple down-sized frames (192$\times$192 pixels). To tackle the accuracy degradation due to the reduced image input size, MR2-ByteTrack correlates the output detections over time using the ByteTrack tracker and corrects potential misclassification using a novel probabilistic Rescore algorithm. By interleaving two down-sized images for every high-resolution one as the input of different state-of-the-art DNN object detectors with our MR2-ByteTrack, we demonstrate an average accuracy increase of 2.16% and a latency reduction of 43% on the GAP9 microcontroller compared to a baseline frame-by-frame inference scheme using exclusively full-resolution images. Code available at: https://github.com/Bomps4/Multi_Resolution_Rescored_ByteTrack

Multi-resolution Rescored ByteTrack for Video Object Detection on Ultra-low-power Embedded Systems

TL;DR

The paper tackles video object detection on ultra-low-power MCUs by introducing MR2-ByteTrack, a pipeline that interleaves full-resolution frames with low-resolution frames, uses a Kalman-filter ByteTrack tracker, and applies a Rescore algorithm to improve class accuracy over time. By sharing weights across resolutions and operating on GAP9’s 9-core cluster, the approach achieves up to 2.25× compute reductions and 43% latency savings while maintaining or improving mAP, notably a 2.16% average gain on ImageNetVID. The method outperforms frame-by-frame baselines and remains competitive with transformer-based VOD approaches, without extra training or significant memory overhead, making high-accuracy VOD feasible on milliwatt-scale embedded devices. This work demonstrates a practical, training-free path to real-time VOD on extreme-edge hardware with clear benefits for surveillance, robotics, and mobile autonomous systems.

Abstract

This paper introduces Multi-Resolution Rescored Byte-Track (MR2-ByteTrack), a novel video object detection framework for ultra-low-power embedded processors. This method reduces the average compute load of an off-the-shelf Deep Neural Network (DNN) based object detector by up to 2.25 by alternating the processing of high-resolution images (320320 pixels) with multiple down-sized frames (192192 pixels). To tackle the accuracy degradation due to the reduced image input size, MR2-ByteTrack correlates the output detections over time using the ByteTrack tracker and corrects potential misclassification using a novel probabilistic Rescore algorithm. By interleaving two down-sized images for every high-resolution one as the input of different state-of-the-art DNN object detectors with our MR2-ByteTrack, we demonstrate an average accuracy increase of 2.16% and a latency reduction of 43% on the GAP9 microcontroller compared to a baseline frame-by-frame inference scheme using exclusively full-resolution images. Code available at: https://github.com/Bomps4/Multi_Resolution_Rescored_ByteTrack
Paper Structure (18 sections, 1 equation, 3 figures, 4 tables, 1 algorithm)

This paper contains 18 sections, 1 equation, 3 figures, 4 tables, 1 algorithm.

Figures (3)

  • Figure 1: Video object detection using NanoDet-Plus nanodet object detector under multiple thresholds and input size settings vs. the proposed MR2-ByteTrack solution.
  • Figure 2: Overview of the proposed Multi-Resolution Rescaled ByteTrack algorithm for video object detection.
  • Figure 3: mAP (blue) and GMAC (red) of MR2-ByteTrack at varying low-res frames vs. the baseline.