Table of Contents
Fetching ...

Approximate Supervised Object Distance Estimation on Unmanned Surface Vehicles

Benjamin Kiefer, Yitong Quan, Andreas Zell

TL;DR

The paper tackles the need for cost-effective distance estimation on USVs by leveraging supervised object detection to predict object distances directly from monocular imagery. It adapts YOLO detectors with an auxiliary distance head, explores multiple distance normalization schemes, and trains with a composite loss, using a maritime dataset (1000 images with bounding boxes and chart-derived distances) plus additional human-labeled data. Through extensive experiments, the method achieves competitive object detection performance and shows that distance estimation can reach real-time accuracy, outperforming triangulation and monocular depth baselines in mean distance error, particularly when combined with tracking and smoothing. The work provides a practical, end-to-end vision-based distance estimation approach for USVs, with publicly released data and clear guidance on trade-offs between distance accuracy and detection performance across varying camera setups and distances.

Abstract

Unmanned surface vehicles (USVs) and boats are increasingly important in maritime operations, yet their deployment is limited due to costly sensors and complexity. LiDAR, radar, and depth cameras are either costly, yield sparse point clouds or are noisy, and require extensive calibration. Here, we introduce a novel approach for approximate distance estimation in USVs using supervised object detection. We collected a dataset comprising images with manually annotated bounding boxes and corresponding distance measurements. Leveraging this data, we propose a specialized branch of an object detection model, not only to detect objects but also to predict their distances from the USV. This method offers a cost-efficient and intuitive alternative to conventional distance measurement techniques, aligning more closely with human estimation capabilities. We demonstrate its application in a marine assistance system that alerts operators to nearby objects such as boats, buoys, or other waterborne hazards.

Approximate Supervised Object Distance Estimation on Unmanned Surface Vehicles

TL;DR

The paper tackles the need for cost-effective distance estimation on USVs by leveraging supervised object detection to predict object distances directly from monocular imagery. It adapts YOLO detectors with an auxiliary distance head, explores multiple distance normalization schemes, and trains with a composite loss, using a maritime dataset (1000 images with bounding boxes and chart-derived distances) plus additional human-labeled data. Through extensive experiments, the method achieves competitive object detection performance and shows that distance estimation can reach real-time accuracy, outperforming triangulation and monocular depth baselines in mean distance error, particularly when combined with tracking and smoothing. The work provides a practical, end-to-end vision-based distance estimation approach for USVs, with publicly released data and clear guidance on trade-offs between distance accuracy and detection performance across varying camera setups and distances.

Abstract

Unmanned surface vehicles (USVs) and boats are increasingly important in maritime operations, yet their deployment is limited due to costly sensors and complexity. LiDAR, radar, and depth cameras are either costly, yield sparse point clouds or are noisy, and require extensive calibration. Here, we introduce a novel approach for approximate distance estimation in USVs using supervised object detection. We collected a dataset comprising images with manually annotated bounding boxes and corresponding distance measurements. Leveraging this data, we propose a specialized branch of an object detection model, not only to detect objects but also to predict their distances from the USV. This method offers a cost-efficient and intuitive alternative to conventional distance measurement techniques, aligning more closely with human estimation capabilities. We demonstrate its application in a marine assistance system that alerts operators to nearby objects such as boats, buoys, or other waterborne hazards.
Paper Structure (24 sections, 9 equations, 5 figures, 6 tables)

This paper contains 24 sections, 9 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Example scene and its bounding box and distance predictions using our method (top). Projected detections onto the 2D plane (bottom).
  • Figure 2: Architecture of our proposed approach at the example of YOLOv7 and YOLOv9. We leave the base architecture of the networks the same except for adding a distance loss branch to the heads (in red).
  • Figure 3: Distribution of distances in our dataset.
  • Figure 4: Distance Distribution by Category
  • Figure 5: Labeling tool used for getting bounding boxes and associating chart data with vision data.