Table of Contents
Fetching ...

Dataset and Benchmark: Novel Sensors for Autonomous Vehicle Perception

Spencer Carmichael, Austin Buchan, Mani Ramanagopal, Radhika Ravi, Ram Vasudevan, Katherine A. Skinner

TL;DR

The paper introduces NSAVP, a novel autonomous-vehicle perception dataset that uniquely combines stereo thermal, stereo event, monochrome, and RGB cameras with high-precision ground-truth poses and opposing-viewpoint sequences. It details the platform hardware, synchronized multi-modal data capture, calibration methods, and ground-truth generation, and provides a concrete benchmarking example on place recognition using LoST-X and NetVLAD. The dataset addresses critical gaps in localization and mapping research under challenging lighting and motion conditions, enabling robust sensor fusion studies. By offering a comprehensive data format, software tools, and a published benchmark, NSAVP supports rapid evaluation and comparison of traditional and novel sensor modalities for AV perception, with planned expansion to lidar/IMU and adverse-weather data.

Abstract

Conventional cameras employed in autonomous vehicle (AV) systems support many perception tasks, but are challenged by low-light or high dynamic range scenes, adverse weather, and fast motion. Novel sensors, such as event and thermal cameras, offer capabilities with the potential to address these scenarios, but they remain to be fully exploited. This paper introduces the Novel Sensors for Autonomous Vehicle Perception (NSAVP) dataset to facilitate future research on this topic. The dataset was captured with a platform including stereo event, thermal, monochrome, and RGB cameras as well as a high precision navigation system providing ground truth poses. The data was collected by repeatedly driving two ~8 km routes and includes varied lighting conditions and opposing viewpoint perspectives. We provide benchmarking experiments on the task of place recognition to demonstrate challenges and opportunities for novel sensors to enhance critical AV perception tasks. To our knowledge, the NSAVP dataset is the first to include stereo thermal cameras together with stereo event and monochrome cameras. The dataset and supporting software suite is available at: https://umautobots.github.io/nsavp

Dataset and Benchmark: Novel Sensors for Autonomous Vehicle Perception

TL;DR

The paper introduces NSAVP, a novel autonomous-vehicle perception dataset that uniquely combines stereo thermal, stereo event, monochrome, and RGB cameras with high-precision ground-truth poses and opposing-viewpoint sequences. It details the platform hardware, synchronized multi-modal data capture, calibration methods, and ground-truth generation, and provides a concrete benchmarking example on place recognition using LoST-X and NetVLAD. The dataset addresses critical gaps in localization and mapping research under challenging lighting and motion conditions, enabling robust sensor fusion studies. By offering a comprehensive data format, software tools, and a published benchmark, NSAVP supports rapid evaluation and comparison of traditional and novel sensor modalities for AV perception, with planned expansion to lidar/IMU and adverse-weather data.

Abstract

Conventional cameras employed in autonomous vehicle (AV) systems support many perception tasks, but are challenged by low-light or high dynamic range scenes, adverse weather, and fast motion. Novel sensors, such as event and thermal cameras, offer capabilities with the potential to address these scenarios, but they remain to be fully exploited. This paper introduces the Novel Sensors for Autonomous Vehicle Perception (NSAVP) dataset to facilitate future research on this topic. The dataset was captured with a platform including stereo event, thermal, monochrome, and RGB cameras as well as a high precision navigation system providing ground truth poses. The data was collected by repeatedly driving two ~8 km routes and includes varied lighting conditions and opposing viewpoint perspectives. We provide benchmarking experiments on the task of place recognition to demonstrate challenges and opportunities for novel sensors to enhance critical AV perception tasks. To our knowledge, the NSAVP dataset is the first to include stereo thermal cameras together with stereo event and monochrome cameras. The dataset and supporting software suite is available at: https://umautobots.github.io/nsavp
Paper Structure (14 sections, 2 equations, 9 figures, 3 tables)

This paper contains 14 sections, 2 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Example data from each camera across six sequences capturing two opposing driving directions under three lighting conditions: night, afternoon, and sunset. For each camera, a composite image formed from the three lighting conditions is shown, with event data represented as a time surface gallego_event-based_2022. All the data is from approximately the same location as determined by the provided ground truth poses. These examples highlight the robustness of thermal cameras to illumination changes and of thermal and event cameras to direct sunlight, which produces saturation and blooming in conventional cameras.
  • Figure 2: Vehicle Sensor PlatformCAD sources: Monochrome and RGB camera models https://www.flir.com/support-center/iis/machine-vision/knowledge-base/technical-documentation-bfs-gige/ Thermal camera model https://www.flir.com/support/products/adk/?pn=ADK&vn=40640U050-6PAAX#Downloads Event camera model https://grabcad.com/library/davis346-event-camera-1 Ford Fusion model (released with the Ford Multi-AV Seasonal Dataset ford_multi_av) https://github.com/Ford/AVData/blob/master/fusion_description/meshes/Ford_fusion.obj. The labeled vehicle diagram shows the position and orientation of the symmetric Left and Right groups of vision sensors, Primary and Secondary POS-LV 420 antennas, and POS-LV 420 Wheel encoder. The sensor detail diagram shows the left sensor group, with a Thermal camera, Event camera, Monochrome camera, and Color (RGB) camera. The left and front vehicle views show the location of the system reference Base link frame. Coordinate frames use red, green, and blue arrows for x, y, and z axes respectively.
  • Figure 3: Comparison of thermal and event stereo depth estimate uncertainty across AV datasets assuming the error model described in matthies_error_1987. Conventional stereo cameras from the established KITTI dataset geiger_vision_2013 are also included for reference. NSAVP achieves the lowest predicted depth uncertainty.
  • Figure 4: Visualization of time synchronization signals sent to each camera modality. For simplicity, single vertical lines are used to represent the square wave edges (rising or falling) each sensor responds to. The durations of the blanking period and frequency of the event camera signal are not to scale.
  • Figure 5: Calibration board AprilTag detections (denoted with red circles) in synchronized images captured or reconstructed across all modalities on the left side of the vehicle.
  • ...and 4 more figures