Table of Contents
Fetching ...

From Variance to Veracity: Unbundling and Mitigating Gradient Variance in Differentiable Bundle Adjustment Layers

Swaminathan Gurumurthy, Karnik Ram, Bingqing Chen, Zachary Manchester, Zico Kolter

TL;DR

The paper addresses training instability in pose estimation pipelines that couple learned correspondences with differentiable BA by identifying gradient variance as a key bottleneck. It analyzes three sources of variance—flow loss interference, BA linearization errors, and weight-gradient dependence on BA residual—and introduces a simple yet effective fix: weight the flow loss using inner-loop BA-predicted weights $\Sigma_{jk}$ with stop-gradient, complemented by periodic balancing of loss contributions via $\beta$. This variance-reduction strategy yields 2–2.5x training speedups on DPVO, improves stability, and maintains or improves pose accuracy on the TartanAir and related benchmarks, with transferable gains to other BA-based pipelines like DROID-SLAM. The work highlights the broader potential of variance-aware training in optimization-augmented neural networks and suggests directions for further study of implicit-layer training dynamics and robustness to domain shift.

Abstract

Various pose estimation and tracking problems in robotics can be decomposed into a correspondence estimation problem (often computed using a deep network) followed by a weighted least squares optimization problem to solve for the poses. Recent work has shown that coupling the two problems by iteratively refining one conditioned on the other's output yields SOTA results across domains. However, training these models has proved challenging, requiring a litany of tricks to stabilize and speed up training. In this work, we take the visual odometry problem as an example and identify three plausible causes: (1) flow loss interference, (2) linearization errors in the bundle adjustment (BA) layer, and (3) dependence of weight gradients on the BA residual. We show how these issues result in noisy and higher variance gradients, potentially leading to a slow down in training and instabilities. We then propose a simple, yet effective solution to reduce the gradient variance by using the weights predicted by the network in the inner optimization loop to weight the correspondence objective in the training problem. This helps the training objective `focus' on the more important points, thereby reducing the variance and mitigating the influence of outliers. We show that the resulting method leads to faster training and can be more flexibly trained in varying training setups without sacrificing performance. In particular we show $2$--$2.5\times$ training speedups over a baseline visual odometry model we modify.

From Variance to Veracity: Unbundling and Mitigating Gradient Variance in Differentiable Bundle Adjustment Layers

TL;DR

The paper addresses training instability in pose estimation pipelines that couple learned correspondences with differentiable BA by identifying gradient variance as a key bottleneck. It analyzes three sources of variance—flow loss interference, BA linearization errors, and weight-gradient dependence on BA residual—and introduces a simple yet effective fix: weight the flow loss using inner-loop BA-predicted weights with stop-gradient, complemented by periodic balancing of loss contributions via . This variance-reduction strategy yields 2–2.5x training speedups on DPVO, improves stability, and maintains or improves pose accuracy on the TartanAir and related benchmarks, with transferable gains to other BA-based pipelines like DROID-SLAM. The work highlights the broader potential of variance-aware training in optimization-augmented neural networks and suggests directions for further study of implicit-layer training dynamics and robustness to domain shift.

Abstract

Various pose estimation and tracking problems in robotics can be decomposed into a correspondence estimation problem (often computed using a deep network) followed by a weighted least squares optimization problem to solve for the poses. Recent work has shown that coupling the two problems by iteratively refining one conditioned on the other's output yields SOTA results across domains. However, training these models has proved challenging, requiring a litany of tricks to stabilize and speed up training. In this work, we take the visual odometry problem as an example and identify three plausible causes: (1) flow loss interference, (2) linearization errors in the bundle adjustment (BA) layer, and (3) dependence of weight gradients on the BA residual. We show how these issues result in noisy and higher variance gradients, potentially leading to a slow down in training and instabilities. We then propose a simple, yet effective solution to reduce the gradient variance by using the weights predicted by the network in the inner optimization loop to weight the correspondence objective in the training problem. This helps the training objective `focus' on the more important points, thereby reducing the variance and mitigating the influence of outliers. We show that the resulting method leads to faster training and can be more flexibly trained in varying training setups without sacrificing performance. In particular we show -- training speedups over a baseline visual odometry model we modify.
Paper Structure (29 sections, 23 equations, 11 figures, 3 tables)

This paper contains 29 sections, 23 equations, 11 figures, 3 tables.

Figures (11)

  • Figure 1: We propose a simple, yet effective solution to stabilize and speed-up the training of SOTA pose estimation methods. (b) We first analyze the causes for their instability related to variance in their gradients, and (a) then mitigate them by using weights from the inner-loop optimization to weigh the correspondence outer objective, which leads to improved performance.
  • Figure 2: (a) We compute the signal-to-noise ratio (SNR) in the loss gradients as we artificially add depth noise while linearizing the BA problem for gradient computation. We observe that the SNR in the flow loss deteriorates rapidly indicating its sensitivity to linearization errors. (b) We artificially add noise to a subset of depths right before the flow loss computation. We show the average gradient errors on all the pose and 'clean' depth variables as a result of the added noise. We see a monotonic increase in gradient error in pose gradients as we increase the noise added showing the impact 'outliers' have on the gradients of even the 'inlier' variables. (c) Similar to (b), here we add noise to the the first frame's pose and show the gradient errors on the rest of the frames and depths.
  • Figure 3: We compute the signal-to-noise ratio in the gradients of the flow loss and the weighted flow loss w.r.t flow network parameters at different training iterations of the base model. Specifically, we use the last linear layer's weights of the flow computation head of the network. We find that the weighted flow loss gradients have a higher SNR throughout the training. This is especially true in the initial iterations of training when the outlier count is very high.
  • Figure 4: We observe that DPVO when trained with our weighted flow loss achieves much faster training, reaching $\sim\!0.2$ m accuracy in only $80$K iterations, and is much more stable. We report the median ATE across three trials on the validation split of TartanAir.
  • Figure 5: We retrain DPVO with and without the modified flow loss in the non-streaming batch setting and evaluate both models on validation sequences from TartanAir in the streaming setting. We observe that, beyond training faster and being more stable, the modified version generalizes better than the original model. This allows the model to be trained on shorter sequences without suffering high performance drops, thanks to the reduced gradient variance. We report the median ATE across three trials.
  • ...and 6 more figures