eNavi: Event-based Imitation Policies for Low-Light Indoor Mobile Robot Navigation

Prithvi Jai Ramesh; Kaustav Chanda; Krishna Vinod; Joseph Raj Vishal; Yezhou Yang; Bharatesh Chakravarthi

eNavi: Event-based Imitation Policies for Low-Light Indoor Mobile Robot Navigation

Prithvi Jai Ramesh, Kaustav Chanda, Krishna Vinod, Joseph Raj Vishal, Yezhou Yang, Bharatesh Chakravarthi

Abstract

Event cameras provide high dynamic range and microsecond-level temporal resolution, making them well-suited for indoor robot navigation, where conventional RGB cameras degrade under fast motion or low-light conditions. Despite advances in event-based perception spanning detection, SLAM, and pose estimation, there remains limited research on end-to-end control policies that exploit the asynchronous nature of event streams. To address this gap, we introduce a real-world indoor person-following dataset collected using a TurtleBot 2 robot, featuring synchronized raw event streams, RGB frames, and expert control actions across multiple indoor maps, trajectories under both normal and low-light conditions. We further build a multimodal data preprocessing pipeline that temporally aligns event and RGB observations while reconstructing ground-truth actions from odometry to support high-quality imitation learning. Building on this dataset, we propose a late-fusion RGB-Event navigation policy that combines dual MobileNet encoders with a transformer-based fusion module trained via behavioral cloning. A systematic evaluation of RGB-only, Event-only, and RGB-Event fusion models across 12 training variations ranging from single-path imitation to general multi-path imitation shows that policies incorporating event data, particularly the fusion model, achieve improved robustness and lower action prediction error, especially in unseen low-light conditions where RGB-only models fail. We release the dataset, synchronization pipeline, and trained models at https://eventbasedvision.github.io/eNavi/

eNavi: Event-based Imitation Policies for Low-Light Indoor Mobile Robot Navigation

Abstract

Paper Structure (27 sections, 5 equations, 4 figures, 3 tables)

This paper contains 27 sections, 5 equations, 4 figures, 3 tables.

Introduction
Related Works
Event-based Vision and Datasets for Robotic Perception
Multimodal Fusion for Robust Robot Navigation
Policy Learning for Robot Navigation
The eNavi Dataset
Hardware and Camera setup
Data Collection Protocol
Multimodal Data Processing Pipeline
Temporal Synchronization of RGB, Events, and Actions
Odometry-based Action Reconstruction
Event-based Navigation Policy
The ENP Architecture
Modality-specific Encoding
Attention-based Fusion and Control Head
...and 12 more sections

Figures (4)

Figure 1: Overview of the dataset generation workflow. (a) The process begins with teleoperated data collection using a mobile robot setup. (b) The collection process where the user is teleoperating the bot to follow another person across the room in three different paths ($P_1$, $P_2$, and $P_3$) in different lighting conditions.
Figure 2: Overview of the dataset generation workflow. The process begins with teleoperated data collection using a mobile robot setup. The recorded ROS $2$ bags are analyzed for trajectory and velocity consistency (center panels) before being synchronized into paired image, event, and action tuples $\{(\mathbf{I}_k, \mathbf{E}_k, \mathbf{a}_k)\}$ stored in .h5 format.
Figure 3: The Event-based Navigation Policy (ENP-Fusion) Architecture. The model fuses a frozen RGB stream and a trainable event stream using a Transformer-based attention mechanism. The system takes synchronized RGB frames ($320 \times 180 \times 3$ ) and event frames ($320 \times 180 \times 2$) as input and outputs continuous differential drive commands ($v, \omega$) via an MLP head. The snowflake icon denotes that the RGB backbone weights remain fixed during training to prevent overfitting in low-light scenarios.
Figure 4: Trajectory Analysis of ENP-Fusion Policy. Comparison of predicted linear velocity (m/s) and angular rate (rad/s) against expert ground truth for representative test trajectory ($P_3$). (a) and (b) contrast performance under normal-light (top) and low-light (down) conditions.

eNavi: Event-based Imitation Policies for Low-Light Indoor Mobile Robot Navigation

Abstract

eNavi: Event-based Imitation Policies for Low-Light Indoor Mobile Robot Navigation

Authors

Abstract

Table of Contents

Figures (4)