M2P2: A Multi-Modal Passive Perception Dataset for Off-Road Mobility in Extreme Low-Light Conditions

Aniket Datar; Anuj Pokhrel; Mohammad Nazeri; Madhan B. Rao; Chenhui Pan; Yufan Zhang; Andre Harrison; Maggie Wigness; Philip R. Osteen; Jinwei Ye; Xuesu Xiao

M2P2: A Multi-Modal Passive Perception Dataset for Off-Road Mobility in Extreme Low-Light Conditions

Aniket Datar, Anuj Pokhrel, Mohammad Nazeri, Madhan B. Rao, Chenhui Pan, Yufan Zhang, Andre Harrison, Maggie Wigness, Philip R. Osteen, Jinwei Ye, Xuesu Xiao

TL;DR

M2P2 tackles the challenge of continuous, long-duration off-road autonomy under extreme low-light by introducing a multi-modal passive perception dataset and sensor suite. It combines thermal, event, and stereo RGB sensing with GPS, IMUs, and LiDAR ground truth, alongside a novel calibration pipeline to fuse all modalities in a common frame. The dataset comprises over 10 hours and 32 km of diverse terrain and lighting, with extensive ground-truth data and precise synchronization, enabling end-to-end learning, depth recovery, and passive odometry in darkness. Demonstrations include a thermal-based behavior-cloning navigation, depth estimation from infrared imagery, and passive visual odometry, highlighting the potential and limitations of purely passive perception for off-road mobility. The work provides a foundational resource and methodology to advance covert, robust perception for autonomous off-road missions in no-light conditions.

Abstract

Long-duration, off-road, autonomous missions require robots to continuously perceive their surroundings regardless of the ambient lighting conditions. Most existing autonomy systems heavily rely on active sensing, e.g., LiDAR, RADAR, and Time-of-Flight sensors, or use (stereo) visible light imaging sensors, e.g., color cameras, to perceive environment geometry and semantics. In scenarios where fully passive perception is required and lighting conditions are degraded to an extent that visible light cameras fail to perceive, most downstream mobility tasks such as obstacle avoidance become impossible. To address such a challenge, this paper presents a Multi-Modal Passive Perception dataset, M2P2, to enable off-road mobility in low-light to no-light conditions. We design a multi-modal sensor suite including thermal, event, and stereo RGB cameras, GPS, two Inertia Measurement Units (IMUs), as well as a high-resolution LiDAR for ground truth, with a novel multi-sensor calibration procedure that can efficiently transform multi-modal perceptual streams into a common coordinate system. Our 10-hour, 32 km dataset also includes mobility data such as robot odometry and actions and covers well-lit, low-light, and no-light conditions, along with paved, on-trail, and off-trail terrain. Our results demonstrate that off-road mobility is possible through only passive perception in extreme low-light conditions using end-to-end learning and classical planning. The project website can be found at https://cs.gmu.edu/~xiao/Research/M2P2/

M2P2: A Multi-Modal Passive Perception Dataset for Off-Road Mobility in Extreme Low-Light Conditions

TL;DR

Abstract

Paper Structure (21 sections, 10 figures, 4 tables)

This paper contains 21 sections, 10 figures, 4 tables.

Introduction
Related Work
Off-Road Perception
Related Datasets
Multi-Modal Sensor Suite
Thermal Camera
Event Camera
Stereo RGB Cameras
IMUs
LiDAR for Ground Truth
Sensor Suite Calibration
Thermal Checkerboard
Event Reconstruction
Multi-Modal Synchronization
All-in-One Calibration Procedure
...and 6 more sections

Figures (10)

Figure 1: Multi-Modal Passive Perception Data Collection in an Off-Road Forest Environment in Complete Darkness. Top Left: Clearpath Husky with the Sensor Suite (flashlight for visualization only); Top Right: Thermal Image; Bottom Left: Event Stream; Bottom Middle: RGB Image (fail to perceive); Bottom Right: LiDAR Point Cloud (for ground truth).
Figure 2: Sensor Suite CAD (Left) and Hardware (Right).
Figure 3: Calibration Target (Thermal, Event, and RGB Image).
Figure 4: Multi-Modal Synchronization: LiDAR trigger synchronized to internal encoder angle ($\theta=360\degree$) initiates frame acquisition at a rate of 10 Hz for RGB and thermal cameras, with event camera recording trigger edges for frame reconstruction.
Figure 5: Transformation Tree of the Sensor Suite: Solid arrows indicate direct hardware transformations, while dotted arrows represent transformations from our multi-modal calibration.
...and 5 more figures

M2P2: A Multi-Modal Passive Perception Dataset for Off-Road Mobility in Extreme Low-Light Conditions

TL;DR

Abstract

M2P2: A Multi-Modal Passive Perception Dataset for Off-Road Mobility in Extreme Low-Light Conditions

Authors

TL;DR

Abstract

Table of Contents

Figures (10)