EVI-SAM: Robust, Real-time, Tightly-coupled Event-Visual-Inertial State Estimation and 3D Dense Mapping
Weipeng Guan, Peiyu Chen, Huibin Zhao, Yu Wang, Peng Lu
TL;DR
EVI-SAM addresses robust, real-time $6$-DoF pose tracking and dense 3D mapping with a monocular event camera by fusing events, images, and IMU in a tightly-coupled hybrid framework. It combines event-based 2D-2D photometric alignment with direct pose constraints in a sliding-window optimization, and introduces an image-guided dense mapping pipeline that reconstructs dense depth and texture via region-growing inpainting and TSDF fusion. The work claims to be the first non-learning approach for monocular event-based dense mapping and demonstrates strong tracking and mapping performance across HDR and aggressive-motion scenarios, including onboard handheld evaluation. The results indicate substantial improvements in robustness and density over existing event-based and image-based baselines, with practical implications for real-time navigation and obstacle avoidance in challenging environments.
Abstract
Event cameras are bio-inspired, motion-activated sensors that demonstrate substantial potential in handling challenging situations, such as motion blur and high-dynamic range. In this paper, we proposed EVI-SAM to tackle the problem of 6 DoF pose tracking and 3D reconstruction using monocular event camera. A novel event-based hybrid tracking framework is designed to estimate the pose, leveraging the robustness of feature matching and the precision of direct alignment. Specifically, we develop an event-based 2D-2D alignment to construct the photometric constraint, and tightly integrate it with the event-based reprojection constraint. The mapping module recovers the dense and colorful depth of the scene through the image-guided event-based mapping method. Subsequently, the appearance, texture, and surface mesh of the 3D scene can be reconstructed by fusing the dense depth map from multiple viewpoints using truncated signed distance function (TSDF) fusion. To the best of our knowledge, this is the first non-learning work to realize event-based dense mapping. Numerical evaluations are performed on both publicly available and self-collected datasets, which qualitatively and quantitatively demonstrate the superior performance of our method. Our EVI-SAM effectively balances accuracy and robustness while maintaining computational efficiency, showcasing superior pose tracking and dense mapping performance in challenging scenarios. Video Demo: https://youtu.be/Nn40U4e5Si8.
