E3D: Event-Based 3D Shape Reconstruction
Alexis Baudron, Zihao W. Wang, Oliver Cossairt, Aggelos K. Katsaggelos
TL;DR
This work tackles dense 3D shape reconstruction from low-power event cameras by casting the problem as multi-view silhouette reconstruction. It introduces an Event-to-Silhouette network (E2S) and a differentiable-rendering driven E3D framework that jointly optimizes silhouettes, camera pose, and a 3D mesh, aided by a synthetic 3D-to-event data generator. On synthetic ShapeNet data and real CeleX experiments, the method demonstrates improved mesh quality and pose estimation and shows resilience to motion blur, while highlighting notable sim-to-real gaps. The approach enables edge-friendly 3D reconstruction for AR/VR and paves the way for event-based 3D sensing with silhouette priors.
Abstract
3D shape reconstruction is a primary component of augmented/virtual reality. Despite being highly advanced, existing solutions based on RGB, RGB-D and Lidar sensors are power and data intensive, which introduces challenges for deployment in edge devices. We approach 3D reconstruction with an event camera, a sensor with significantly lower power, latency and data expense while enabling high dynamic range. While previous event-based 3D reconstruction methods are primarily based on stereo vision, we cast the problem as multi-view shape from silhouette using a monocular event camera. The output from a moving event camera is a sparse point set of space-time gradients, largely sketching scene/object edges and contours. We first introduce an event-to-silhouette (E2S) neural network module to transform a stack of event frames to the corresponding silhouettes, with additional neural branches for camera pose regression. Second, we introduce E3D, which employs a 3D differentiable renderer (PyTorch3D) to enforce cross-view 3D mesh consistency and fine-tune the E2S and pose network. Lastly, we introduce a 3D-to-events simulation pipeline and apply it to publicly available object datasets and generate synthetic event/silhouette training pairs for supervised learning.
