Table of Contents
Fetching ...

EMT: A Visual Multi-Task Benchmark Dataset for Autonomous Driving

Nadya Abdel Madjid, Murad Mebrahtu, Abdulrahman Ahmad, Abdelmoamen Nasser, Bilal Hassan, Naoufel Werghi, Jorge Dias, Majid Khonji

TL;DR

EMT addresses the need for a region-specific, multi-task benchmark in autonomous driving by providing a unified visual dataset that supports tracking, trajectory forecasting, and intention prediction. The authors systematically evaluate multiple task-specific models and cross-task dependencies, offering baseline detectors, trackers, and predictors across three complementary benchmarks collected in the UAE. Key contributions include three aligned task datasets with extensive annotations (over 570k bounding boxes across ~20 videos), diverse driving scenarios, and robust evaluation protocols, enabling cross-task analysis and generalization to underrepresented regions. The work advances practical autonomous driving research by enabling region-aware model development and cross-task assessment, while outlining directions for Sim2Real and multimodal extensions to further enhance safety and reliability in Gulf-region traffic.

Abstract

This paper introduces the Emirates Multi-Task (EMT) dataset, designed to support multi-task benchmarking within a unified framework. It comprises over 30,000 frames from a dash-camera perspective and 570,000 annotated bounding boxes, covering approximately 150 kilometers of driving routes that reflect the distinctive road topology, congestion patterns, and driving behavior of Gulf region traffic. The dataset supports three primary tasks: tracking, trajectory forecasting, and intention prediction. Each benchmark is accompanied by corresponding evaluations: (1) multi-agent tracking experiments addressing multi-class scenarios and occlusion handling; (2) trajectory forecasting evaluation using deep sequential and interaction-aware models; and (3) intention prediction experiments based on observed trajectories. The dataset is publicly available at https://avlab.io/emt-dataset, with pre-processing scripts and evaluation models at https://github.com/AV-Lab/emt-dataset.

EMT: A Visual Multi-Task Benchmark Dataset for Autonomous Driving

TL;DR

EMT addresses the need for a region-specific, multi-task benchmark in autonomous driving by providing a unified visual dataset that supports tracking, trajectory forecasting, and intention prediction. The authors systematically evaluate multiple task-specific models and cross-task dependencies, offering baseline detectors, trackers, and predictors across three complementary benchmarks collected in the UAE. Key contributions include three aligned task datasets with extensive annotations (over 570k bounding boxes across ~20 videos), diverse driving scenarios, and robust evaluation protocols, enabling cross-task analysis and generalization to underrepresented regions. The work advances practical autonomous driving research by enabling region-aware model development and cross-task assessment, while outlining directions for Sim2Real and multimodal extensions to further enhance safety and reliability in Gulf-region traffic.

Abstract

This paper introduces the Emirates Multi-Task (EMT) dataset, designed to support multi-task benchmarking within a unified framework. It comprises over 30,000 frames from a dash-camera perspective and 570,000 annotated bounding boxes, covering approximately 150 kilometers of driving routes that reflect the distinctive road topology, congestion patterns, and driving behavior of Gulf region traffic. The dataset supports three primary tasks: tracking, trajectory forecasting, and intention prediction. Each benchmark is accompanied by corresponding evaluations: (1) multi-agent tracking experiments addressing multi-class scenarios and occlusion handling; (2) trajectory forecasting evaluation using deep sequential and interaction-aware models; and (3) intention prediction experiments based on observed trajectories. The dataset is publicly available at https://avlab.io/emt-dataset, with pre-processing scripts and evaluation models at https://github.com/AV-Lab/emt-dataset.

Paper Structure

This paper contains 31 sections, 2 equations, 6 figures, 10 tables.

Figures (6)

  • Figure 1: Samples from EMT dataset capturing highway scenarios in day and night time, clear and rainy weather.
  • Figure 2: Samples of annotated agents, including small motorized vehicles, medium and large vehicles, emergency vehicles, buses, motorbikes (comprising motorbike and rider), and cyclists (comprising bicycle and rider).
  • Figure 3: Common traffic scenarios in the UAE include: (a) large city junctions, often featuring unique elements like free right turns to reduce congestion, (b) roundabouts with vehicles navigating through various stages such as approaching, circulating, and exiting, and (c) highways with exits, where vehicles transition smoothly by merging into designated lanes and exiting the main road.
  • Figure 4: Samples of predicted trajectories: ground truth trajectories are shown in red, while predictions are depicted in green. The predicted trajectories accurately capture the agent's heading but exhibit lower accuracy in predicting speed, which in turn affects the precise estimation of future trajectory locations.
  • Figure 5: Samples of predicted intentions during daytime. The ground truth intention for the next timestamp is shown in orange, while predictions are in purple. Red rectangles highlight misclassifications, such as predicting "keep_lane" for walking and crossing, failing to predict "merge," and incorrectly assigning "stop" to a reversing vehicle. Green indicates correctly predicted intentions, including accurate "keep_lane" predictions on the main road, successful "merge" predictions from the right road, and correct stopping behavior at crossings.
  • ...and 1 more figures