gradSLAM: Automagically differentiable SLAM
Krishna Murthy Jatavallabhula, Soroush Saryazdi, Ganesh Iyer, Liam Paull
TL;DR
<3-5 sentence high-level summary>gradSLAM reframes dense SLAM as a fully differentiable computational graph, enabling gradient-based learning to flow from 3D maps back to 2D image and depth measurements. It introduces a differentiable nonlinear least squares solver (∇LM), differentiable mapping, fusion, and ray backprojection to replace non-differentiable blocks, and demonstrates three differentiable variants of KinectFusion, PointFusion, and ICP-SLAM. Across extensive experiments, the differentiable implementations achieve similar performance to their non-differentiable counterparts while providing explicit gradients through the entire SLAM pipeline, enabling self-supervised and task-driven learning. The work is released as an open-source PyTorch framework to foster spatially grounded learning and broader adoption in downstream vision-and-action tasks.
Abstract
Blending representation learning approaches with simultaneous localization and mapping (SLAM) systems is an open question, because of their highly modular and complex nature. Functionally, SLAM is an operation that transforms raw sensor inputs into a distribution over the state(s) of the robot and the environment. If this transformation (SLAM) were expressible as a differentiable function, we could leverage task-based error signals to learn representations that optimize task performance. However, several components of a typical dense SLAM system are non-differentiable. In this work, we propose gradSLAM, a methodology for posing SLAM systems as differentiable computational graphs, which unifies gradient-based learning and SLAM. We propose differentiable trust-region optimizers, surface measurement and fusion schemes, and raycasting, without sacrificing accuracy. This amalgamation of dense SLAM with computational graphs enables us to backprop all the way from 3D maps to 2D pixels, opening up new possibilities in gradient-based learning for SLAM. TL;DR: We leverage the power of automatic differentiation frameworks to make dense SLAM differentiable.
