Table of Contents
Fetching ...

R3ST: A Synthetic 3D Dataset With Realistic Trajectories

Simone Teglia, Claudia Melis Tonti, Francesco Pro, Leonardo Russo, Andrea Alfarano, Leonardo Pentassuglia, Irene Amerini

TL;DR

R3ST tackles the gap between synthetic data convenience and realism in vehicle motion by embedding real trajectories from SinD into Blender-rendered urban intersections, producing photorealistic imagery with rich multimodal annotations. The dataset spans two intersections with four camera views and over 80K frames, enabling tasks from object detection to monocular depth estimation and trajectory forecasting. By evaluating pre-trained models and showing strong detection performance after fine-tuning, the work demonstrates R3ST's utility for training and evaluating autonomous driving and traffic-analysis systems. This approach promises better generalization to real-world driving, addressing domain shift while preserving the benefits of synthetic data generation.

Abstract

Datasets are essential to train and evaluate computer vision models used for traffic analysis and to enhance road safety. Existing real datasets fit real-world scenarios, capturing authentic road object behaviors, however, they typically lack precise ground-truth annotations. In contrast, synthetic datasets play a crucial role, allowing for the annotation of a large number of frames without additional costs or extra time. However, a general drawback of synthetic datasets is the lack of realistic vehicle motion, since trajectories are generated using AI models or rule-based systems. In this work, we introduce R3ST (Realistic 3D Synthetic Trajectories), a synthetic dataset that overcomes this limitation by generating a synthetic 3D environment and integrating real-world trajectories derived from SinD, a bird's-eye-view dataset recorded from drone footage. The proposed dataset closes the gap between synthetic data and realistic trajectories, advancing the research in trajectory forecasting of road vehicles, offering both accurate multimodal ground-truth annotations and authentic human-driven vehicle trajectories.

R3ST: A Synthetic 3D Dataset With Realistic Trajectories

TL;DR

R3ST tackles the gap between synthetic data convenience and realism in vehicle motion by embedding real trajectories from SinD into Blender-rendered urban intersections, producing photorealistic imagery with rich multimodal annotations. The dataset spans two intersections with four camera views and over 80K frames, enabling tasks from object detection to monocular depth estimation and trajectory forecasting. By evaluating pre-trained models and showing strong detection performance after fine-tuning, the work demonstrates R3ST's utility for training and evaluating autonomous driving and traffic-analysis systems. This approach promises better generalization to real-world driving, addressing domain shift while preserving the benefits of synthetic data generation.

Abstract

Datasets are essential to train and evaluate computer vision models used for traffic analysis and to enhance road safety. Existing real datasets fit real-world scenarios, capturing authentic road object behaviors, however, they typically lack precise ground-truth annotations. In contrast, synthetic datasets play a crucial role, allowing for the annotation of a large number of frames without additional costs or extra time. However, a general drawback of synthetic datasets is the lack of realistic vehicle motion, since trajectories are generated using AI models or rule-based systems. In this work, we introduce R3ST (Realistic 3D Synthetic Trajectories), a synthetic dataset that overcomes this limitation by generating a synthetic 3D environment and integrating real-world trajectories derived from SinD, a bird's-eye-view dataset recorded from drone footage. The proposed dataset closes the gap between synthetic data and realistic trajectories, advancing the research in trajectory forecasting of road vehicles, offering both accurate multimodal ground-truth annotations and authentic human-driven vehicle trajectories.

Paper Structure

This paper contains 10 sections, 3 figures, 2 tables.

Figures (3)

  • Figure 1: The R3ST Dataset. It provides photorealistic synthetic images (first), depth maps (second), instance segmentation masks (third), and object detection bounding boxes (fourth), enabling diverse computer vision tasks. Each task is presented on a different frame.
  • Figure 2: Visualization of clustered vehicle trajectories in a R3ST crossroad. Each colored trajectory represents a cluster of similar paths, while the shaded regions indicate variance within each cluster.
  • Figure 3: Qualitative results for instance segmentation (top two rows) and monocular depth estimation (bottom two rows). The first column shows RGB frames, the second column contains ground truth annotations, while the third and fourth columns present model predictions. Instance segmentation results are obtained using YOLO-Seg yolo11_ultralytics and SAM2 ravi2024sam2segmentimages online demo, while monocular depth estimation is performed with AnyDepth he2025distilldepthdistillationcreates and Pixelformer Large agarwal2022attentionattentioneverywheremonocular pre-trained on KITTI Geiger2013IJRR.