SMORE: Simultaneous Map and Object REconstruction
Nathaniel Chodosh, Anish Madan, Simon Lucey, Deva Ramanan
TL;DR
SMORE tackles dynamic scene reconstruction in large-scale urban LiDAR data by decomposing the scene into rigidly moving objects and a static background and optimizing both geometry and motion. It uses a global 3D point-to-surface objective minimized via coordinate descent, with a rolling-shutter deskewing model that extends to dynamic actors, enabling accurate reconstructions without retraining. The method achieves order-of-magnitude improvements over prior art in LiDAR novel-view synthesis and demonstrates robust ego- and actor-pose estimation, with practical applications in auto-labeling depth completion and scene flow. This approach enables high-fidelity, time-consistent reconstructions suitable for downstream perception tasks in autonomous driving.
Abstract
We present a method for dynamic surface reconstruction of large-scale urban scenes from LiDAR. Depth-based reconstructions tend to focus on small-scale objects or large-scale SLAM reconstructions that treat moving objects as outliers. We take a holistic perspective and optimize a compositional model of a dynamic scene that decomposes the world into rigidly-moving objects and the background. To achieve this, we take inspiration from recent novel view synthesis methods and frame the reconstruction problem as a global optimization over neural surfaces, ego poses, and object poses, which minimizes the error between composed spacetime surfaces and input LiDAR scans. In contrast to view synthesis methods, which typically minimize 2D errors with gradient descent, we minimize a 3D point-to-surface error by coordinate descent, which we decompose into registration and surface reconstruction steps. Each step can be handled well by off-the-shelf methods without any re-training. We analyze the surface reconstruction step for rolling-shutter LiDARs, and show that deskewing operations common in continuous time SLAM can be applied to dynamic objects as well, improving results over prior art by an order of magnitude. Beyond pursuing dynamic reconstruction as a goal in and of itself, we propose that such a system can be used to auto-label partially annotated sequences and produce ground truth annotation for hard-to-label problems such as depth completion and scene flow. Please see https://anishmadan23.github.io/smore/ for more visual results.
