Table of Contents
Fetching ...

RAMOTS: A Real-Time System for Aerial Multi-Object Tracking based on Deep Learning and Big Data Technology

Nhat-Tan Do, Nhi Ngoc-Yen Nguyen, Dieu-Phuong Nguyen, Trong-Hop Do

TL;DR

RAMOTS presents a real-time aerial MOT framework that couples a big-data streaming stack (Kafka/Spark) with state-of-the-art detectors (YOLOv8/YOLOv10) and trackers (ByteTrack/BoTSORT) to address UAV-specific challenges such as small targets and dynamic viewpoints. The framework demonstrates real-time performance (28 FPS on a single GPU) and competitive MOT metrics (e.g., HOTA 48.14, MOTA 43.51 on VisDrone2019-MOT) on public UAV datasets (VisDrone2019-MOT, UAVDT). Experimental results indicate BoTSORT generally outperforms ByteTrack in this setting, with YOLOv8l+BoTSORT achieving top performance on VisDrone and YOLOv10l+BoTSORT excelling on UAVDT, while RT-DETR variants show potential but require further optimization. The work highlights the practical value of integrating deep learning-based MOT with scalable, fault-tolerant big-data platforms for real-time UAV applications, and points to future directions like additional sensors and adaptive resource allocation for UAV swarms.

Abstract

Multi-object tracking (MOT) in UAV-based video is challenging due to variations in viewpoint, low resolution, and the presence of small objects. While other research on MOT dedicated to aerial videos primarily focuses on the academic aspect by developing sophisticated algorithms, there is a lack of attention to the practical aspect of these systems. In this paper, we propose a novel real-time MOT framework that integrates Apache Kafka and Apache Spark for efficient and fault-tolerant video stream processing, along with state-of-the-art deep learning models YOLOv8/YOLOv10 and BYTETRACK/BoTSORT for accurate object detection and tracking. Our work highlights the importance of not only the advanced algorithms but also the integration of these methods with scalable and distributed systems. By leveraging these technologies, our system achieves a HOTA of 48.14 and a MOTA of 43.51 on the Visdrone2019-MOT test set while maintaining a real-time processing speed of 28 FPS on a single GPU. Our work demonstrates the potential of big data technologies and deep learning for addressing the challenges of MOT in UAV applications.

RAMOTS: A Real-Time System for Aerial Multi-Object Tracking based on Deep Learning and Big Data Technology

TL;DR

RAMOTS presents a real-time aerial MOT framework that couples a big-data streaming stack (Kafka/Spark) with state-of-the-art detectors (YOLOv8/YOLOv10) and trackers (ByteTrack/BoTSORT) to address UAV-specific challenges such as small targets and dynamic viewpoints. The framework demonstrates real-time performance (28 FPS on a single GPU) and competitive MOT metrics (e.g., HOTA 48.14, MOTA 43.51 on VisDrone2019-MOT) on public UAV datasets (VisDrone2019-MOT, UAVDT). Experimental results indicate BoTSORT generally outperforms ByteTrack in this setting, with YOLOv8l+BoTSORT achieving top performance on VisDrone and YOLOv10l+BoTSORT excelling on UAVDT, while RT-DETR variants show potential but require further optimization. The work highlights the practical value of integrating deep learning-based MOT with scalable, fault-tolerant big-data platforms for real-time UAV applications, and points to future directions like additional sensors and adaptive resource allocation for UAV swarms.

Abstract

Multi-object tracking (MOT) in UAV-based video is challenging due to variations in viewpoint, low resolution, and the presence of small objects. While other research on MOT dedicated to aerial videos primarily focuses on the academic aspect by developing sophisticated algorithms, there is a lack of attention to the practical aspect of these systems. In this paper, we propose a novel real-time MOT framework that integrates Apache Kafka and Apache Spark for efficient and fault-tolerant video stream processing, along with state-of-the-art deep learning models YOLOv8/YOLOv10 and BYTETRACK/BoTSORT for accurate object detection and tracking. Our work highlights the importance of not only the advanced algorithms but also the integration of these methods with scalable and distributed systems. By leveraging these technologies, our system achieves a HOTA of 48.14 and a MOTA of 43.51 on the Visdrone2019-MOT test set while maintaining a real-time processing speed of 28 FPS on a single GPU. Our work demonstrates the potential of big data technologies and deep learning for addressing the challenges of MOT in UAV applications.

Paper Structure

This paper contains 11 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Our proposed Real-time UAV Videos Multi-object Tracking System
  • Figure 2: samples of UAVDT and VisDrone dataset. The videos from the two datasets cover various aspects, including location, environment, lighting, objects, and density.
  • Figure 3: Comparative performance of Deep Learning-based MOT methods on VisDrone2019-MOT and UAVDT dataset. The horizontal axis, vertical axis, and radius of the circle are IDF1, MOTA, and FPS, respectively. Detail comparison are shown in \ref{['tab:exp-results']} and \ref{['tab:exp-results_2']}