Table of Contents
Fetching ...

Real-Time Multi-Object Tracking using YOLOv8 and SORT on a SoC FPGA

Michal Danilowicz, Tomasz Kryjak

TL;DR

The paper addresses real-time multi-object tracking on energy-constrained embedded platforms by presenting a heterogeneous SoC FPGA design that couples a quantized YOLOv8_nano detector implemented in PL with a SORT tracker executed in the PS. The detector uses Quantization-Aware Training with 4-bit weights/activations via the FINN framework and accesses external memory to store parameters, enabling efficient hardware acceleration. The integrated system achieves high detector throughput (≈195.3 fps) and competitive tracking performance (MOTA ≈ 38.9) on MOT15 while evaluating on COCO for detection quality (mAP ≈ 0.21). This work demonstrates the practical feasibility of embedded MOT on MPSoC FPGAs and outlines trade-offs and future enhancements for even more capable energy-efficient perception in mobile robotics and autonomous systems.

Abstract

Multi-object tracking (MOT) is one of the most important problems in computer vision and a key component of any vision-based perception system used in advanced autonomous mobile robotics. Therefore, its implementation on low-power and real-time embedded platforms is highly desirable. Modern MOT algorithms should be able to track objects of a given class (e.g. people or vehicles). In addition, the number of objects to be tracked is not known in advance, and they may appear and disappear at any time, as well as be obscured. For these reasons, the most popular and successful approaches have recently been based on the tracking paradigm. Therefore, the presence of a high quality object detector is essential, which in practice accounts for the vast majority of the computational and memory complexity of the whole MOT system. In this paper, we propose an FPGA (Field-Programmable Gate Array) implementation of an embedded MOT system based on a quantized YOLOv8 detector and the SORT (Simple Online Realtime Tracker) tracker. We use a modified version of the FINN framework to utilize external memory for model parameters and to support operations necessary required by YOLOv8. We discuss the evaluation of detection and tracking performance using the COCO and MOT15 datasets, where we achieve 0.21 mAP and 38.9 MOTA respectively. As the computational platform, we use an MPSoC system (Zynq UltraScale+ device from AMD/Xilinx) where the detector is deployed in reprogrammable logic and the tracking algorithm is implemented in the processor system.

Real-Time Multi-Object Tracking using YOLOv8 and SORT on a SoC FPGA

TL;DR

The paper addresses real-time multi-object tracking on energy-constrained embedded platforms by presenting a heterogeneous SoC FPGA design that couples a quantized YOLOv8_nano detector implemented in PL with a SORT tracker executed in the PS. The detector uses Quantization-Aware Training with 4-bit weights/activations via the FINN framework and accesses external memory to store parameters, enabling efficient hardware acceleration. The integrated system achieves high detector throughput (≈195.3 fps) and competitive tracking performance (MOTA ≈ 38.9) on MOT15 while evaluating on COCO for detection quality (mAP ≈ 0.21). This work demonstrates the practical feasibility of embedded MOT on MPSoC FPGAs and outlines trade-offs and future enhancements for even more capable energy-efficient perception in mobile robotics and autonomous systems.

Abstract

Multi-object tracking (MOT) is one of the most important problems in computer vision and a key component of any vision-based perception system used in advanced autonomous mobile robotics. Therefore, its implementation on low-power and real-time embedded platforms is highly desirable. Modern MOT algorithms should be able to track objects of a given class (e.g. people or vehicles). In addition, the number of objects to be tracked is not known in advance, and they may appear and disappear at any time, as well as be obscured. For these reasons, the most popular and successful approaches have recently been based on the tracking paradigm. Therefore, the presence of a high quality object detector is essential, which in practice accounts for the vast majority of the computational and memory complexity of the whole MOT system. In this paper, we propose an FPGA (Field-Programmable Gate Array) implementation of an embedded MOT system based on a quantized YOLOv8 detector and the SORT (Simple Online Realtime Tracker) tracker. We use a modified version of the FINN framework to utilize external memory for model parameters and to support operations necessary required by YOLOv8. We discuss the evaluation of detection and tracking performance using the COCO and MOT15 datasets, where we achieve 0.21 mAP and 38.9 MOTA respectively. As the computational platform, we use an MPSoC system (Zynq UltraScale+ device from AMD/Xilinx) where the detector is deployed in reprogrammable logic and the tracking algorithm is implemented in the processor system.

Paper Structure

This paper contains 11 sections, 1 equation, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Top-level diagram of our multi-object tracking system implemented in SoC FPGA.
  • Figure 2: Topology of the YOLOv8 detector used in the proposed MOT system. The Conv block contains a quantised 2d convolution, batch normalisation, ReLU and activation quantiser. Red arrows and green arrows represent two groups of quantised tensors, that, each share a common quantisation scale. This was necessary to properly simplify the computational graph -- more details this in Section \ref{['sec:fpga_arch']}
  • Figure 3: Parallel task sharing between PS and PL in the SoC FPGA device. The length of tasks here is not precise, it only illustrates that the PL part is faster than the PS part in our case.
  • Figure 4: Hardware-software system implemented in the ZCU102 SoC FPGA.
  • Figure 5: Deployment of a single conv block to an FPGA using the FINN library. First, the block is loaded from brevitas code to graph representation a). Then, affine transformations are collapsed into the MultiThreshold operation and the Mul node at the end can be moved past the following convolution in the network or delegated to postprocessing on the PS if this is the end of the accelerator b). Finally, each node is represented by IP from finn-hlslib library (convolution is a sequence of padding, context generation and matrix multiplication) c).
  • ...and 2 more figures