Table of Contents
Fetching ...

PoLaRIS Dataset: A Maritime Object Detection and Tracking Dataset in Pohang Canal

Jiwon Choi, Dongjin Cho, Gihyeon Lee, Hogyun Kim, Geonmo Yang, Joowan Kim, Younggun Cho

TL;DR

PoLaRIS addresses the scarcity of maritime perception data by delivering a multi-modal dataset with synchronized RGB, TIR, LiDAR, and Radar annotations, including dynamic obstacles and depth cues for robust detection and tracking. The dataset supports both 2D and 3D evaluation through ground-truth bounding boxes and object IDs, enabling cross-modal benchmarking under day and night conditions. A semi-automatic labeling pipeline combines detector-assisted proposals with manual refinement to produce accurate annotations across sensors and scales, including small dynamic objects. Benchmark results using conventional and state-of-the-art detectors and trackers demonstrate the dataset’s utility for advancing autonomous naval navigation and obstacle avoidance, with the dataset publicly available at https://sites.google.com/view/polaris-dataset.

Abstract

Maritime environments often present hazardous situations due to factors such as moving ships or buoys, which become obstacles under the influence of waves. In such challenging conditions, the ability to detect and track potentially hazardous objects is critical for the safe navigation of marine robots. To address the scarcity of comprehensive datasets capturing these dynamic scenarios, we introduce a new multi-modal dataset that includes image and point-wise annotations of maritime hazards. Our dataset provides detailed ground truth for obstacle detection and tracking, including objects as small as 10$\times$10 pixels, which are crucial for maritime safety. To validate the dataset's effectiveness as a reliable benchmark, we conducted evaluations using various methodologies, including \ac{SOTA} techniques for object detection and tracking. These evaluations are expected to contribute to performance improvements, particularly in the complex maritime environment. To the best of our knowledge, this is the first dataset offering multi-modal annotations specifically tailored to maritime environments. Our dataset is available at https://sites.google.com/view/polaris-dataset.

PoLaRIS Dataset: A Maritime Object Detection and Tracking Dataset in Pohang Canal

TL;DR

PoLaRIS addresses the scarcity of maritime perception data by delivering a multi-modal dataset with synchronized RGB, TIR, LiDAR, and Radar annotations, including dynamic obstacles and depth cues for robust detection and tracking. The dataset supports both 2D and 3D evaluation through ground-truth bounding boxes and object IDs, enabling cross-modal benchmarking under day and night conditions. A semi-automatic labeling pipeline combines detector-assisted proposals with manual refinement to produce accurate annotations across sensors and scales, including small dynamic objects. Benchmark results using conventional and state-of-the-art detectors and trackers demonstrate the dataset’s utility for advancing autonomous naval navigation and obstacle avoidance, with the dataset publicly available at https://sites.google.com/view/polaris-dataset.

Abstract

Maritime environments often present hazardous situations due to factors such as moving ships or buoys, which become obstacles under the influence of waves. In such challenging conditions, the ability to detect and track potentially hazardous objects is critical for the safe navigation of marine robots. To address the scarcity of comprehensive datasets capturing these dynamic scenarios, we introduce a new multi-modal dataset that includes image and point-wise annotations of maritime hazards. Our dataset provides detailed ground truth for obstacle detection and tracking, including objects as small as 1010 pixels, which are crucial for maritime safety. To validate the dataset's effectiveness as a reliable benchmark, we conducted evaluations using various methodologies, including \ac{SOTA} techniques for object detection and tracking. These evaluations are expected to contribute to performance improvements, particularly in the complex maritime environment. To the best of our knowledge, this is the first dataset offering multi-modal annotations specifically tailored to maritime environments. Our dataset is available at https://sites.google.com/view/polaris-dataset.

Paper Structure

This paper contains 26 sections, 8 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Pohang00 sequence's 11194th scene. A clustered sparse Radar point cloud is scattered far and wide in the background. A dense red LiDAR point cloud is also scattered relatively short. To convert this scene PoLaRIS00, we first annotate a RGB image. Then, annotated bounding boxes in the RGB image are mapped to TIR. Finally, we extract only the LiDAR and Radar point clouds projected on annotated bounding boxes of the RGB image to achieve multi-modal annotation.
  • Figure 2: The vertical axis represents the Pohang00-04 sequences for the sensor modalities: image, LiDAR, and Radar. The horizontal axis indicates the number of labeled data for each sensor modality. Camera refers to left, right, and TIR image data, while Radar and LiDAR primarily detect dynamic obstacles and have limitations in identifying distant objects, resulting in significantly fewer data points compared to image data.
  • Figure 3: The process of annotating the left image. (a) shows the annotation process during the day, where objects are more easily detected and labeled. In contrast, (b) illustrates the annotation process at night, requiring additional steps like image restoration due to low visibility. The Initial Bounding box (Bbox) results from the initialization step, while FP and FN Bbox represent false positive and false negative detections, respectively. The Accurate Bbox corresponds to the provided ground truth labels.
  • Figure 4: The process of semi-automatic annotation for multi-modal sensors. (a) shows the process of defining labels in the TIR image using transformation data from the left image and manually correcting label errors. (b) shows the process of filtering LiDAR points based on the ground truth labels from the left image to obtain LiDAR points' annotation. (c) demonstrates the process of clustering Radar points and defining their labels by identify clusters that overlap with the labeled points obtained in (b).
  • Figure 5: Illustration of dynamic objects.
  • ...and 2 more figures