Table of Contents
Fetching ...

gradSLAM: Automagically differentiable SLAM

Krishna Murthy Jatavallabhula, Soroush Saryazdi, Ganesh Iyer, Liam Paull

TL;DR

<3-5 sentence high-level summary>gradSLAM reframes dense SLAM as a fully differentiable computational graph, enabling gradient-based learning to flow from 3D maps back to 2D image and depth measurements. It introduces a differentiable nonlinear least squares solver (∇LM), differentiable mapping, fusion, and ray backprojection to replace non-differentiable blocks, and demonstrates three differentiable variants of KinectFusion, PointFusion, and ICP-SLAM. Across extensive experiments, the differentiable implementations achieve similar performance to their non-differentiable counterparts while providing explicit gradients through the entire SLAM pipeline, enabling self-supervised and task-driven learning. The work is released as an open-source PyTorch framework to foster spatially grounded learning and broader adoption in downstream vision-and-action tasks.

Abstract

Blending representation learning approaches with simultaneous localization and mapping (SLAM) systems is an open question, because of their highly modular and complex nature. Functionally, SLAM is an operation that transforms raw sensor inputs into a distribution over the state(s) of the robot and the environment. If this transformation (SLAM) were expressible as a differentiable function, we could leverage task-based error signals to learn representations that optimize task performance. However, several components of a typical dense SLAM system are non-differentiable. In this work, we propose gradSLAM, a methodology for posing SLAM systems as differentiable computational graphs, which unifies gradient-based learning and SLAM. We propose differentiable trust-region optimizers, surface measurement and fusion schemes, and raycasting, without sacrificing accuracy. This amalgamation of dense SLAM with computational graphs enables us to backprop all the way from 3D maps to 2D pixels, opening up new possibilities in gradient-based learning for SLAM. TL;DR: We leverage the power of automatic differentiation frameworks to make dense SLAM differentiable.

gradSLAM: Automagically differentiable SLAM

TL;DR

<3-5 sentence high-level summary>gradSLAM reframes dense SLAM as a fully differentiable computational graph, enabling gradient-based learning to flow from 3D maps back to 2D image and depth measurements. It introduces a differentiable nonlinear least squares solver (∇LM), differentiable mapping, fusion, and ray backprojection to replace non-differentiable blocks, and demonstrates three differentiable variants of KinectFusion, PointFusion, and ICP-SLAM. Across extensive experiments, the differentiable implementations achieve similar performance to their non-differentiable counterparts while providing explicit gradients through the entire SLAM pipeline, enabling self-supervised and task-driven learning. The work is released as an open-source PyTorch framework to foster spatially grounded learning and broader adoption in downstream vision-and-action tasks.

Abstract

Blending representation learning approaches with simultaneous localization and mapping (SLAM) systems is an open question, because of their highly modular and complex nature. Functionally, SLAM is an operation that transforms raw sensor inputs into a distribution over the state(s) of the robot and the environment. If this transformation (SLAM) were expressible as a differentiable function, we could leverage task-based error signals to learn representations that optimize task performance. However, several components of a typical dense SLAM system are non-differentiable. In this work, we propose gradSLAM, a methodology for posing SLAM systems as differentiable computational graphs, which unifies gradient-based learning and SLAM. We propose differentiable trust-region optimizers, surface measurement and fusion schemes, and raycasting, without sacrificing accuracy. This amalgamation of dense SLAM with computational graphs enables us to backprop all the way from 3D maps to 2D pixels, opening up new possibilities in gradient-based learning for SLAM. TL;DR: We leverage the power of automatic differentiation frameworks to make dense SLAM differentiable.

Paper Structure

This paper contains 27 sections, 4 equations, 14 figures, 2 tables.

Figures (14)

  • Figure 1: $\nabla$SLAM (gradSLAM) is a fully differentiable dense simultaneous localization and mapping (SLAM) system. The central idea of $\nabla$SLAM is to construct a computational graph representing every operation in a dense SLAM system. We propose differentiable alternatives to several non-differentiable components of traditional dense SLAM systems, such as optimization, odometry estimation, raycasting, and map fusion. This creates a pathway for gradient-flow from 3D map elements to sensor observations (e.g., pixels). We implement differentiable variants of three dense SLAM systems that operate on voxels, surfels, and pointclouds respectively. $\nabla$SLAM thus is a novel paradigm to integrate representation learning approaches with classical SLAM.
  • Figure 2: A computational graph. Nodes in red represent variables. Nodes in blue represent operations on variables. Edges represent data flow. This graph computes the function $3(xy+z)$. Dashed lines indicate (local, i.e., per-node) gradients in the backward pass.
  • Figure 3: Computational graph for $\nabla$LM
  • Figure 4: An example curve fitting problem, showing that $\nabla$LM performs near-identical to LM, with the added advantage of being fully differentiable.
  • Figure 5: Computation graph for the differentiable mapping module. The uncolored boxes indicate intermediate variables, while the colored boxes indicate processing blocks. Note that the specific choice of the functions for update surface measurement and map fusion depend on the map representation used.
  • ...and 9 more figures