From Variance to Veracity: Unbundling and Mitigating Gradient Variance in Differentiable Bundle Adjustment Layers
Swaminathan Gurumurthy, Karnik Ram, Bingqing Chen, Zachary Manchester, Zico Kolter
TL;DR
The paper addresses training instability in pose estimation pipelines that couple learned correspondences with differentiable BA by identifying gradient variance as a key bottleneck. It analyzes three sources of variance—flow loss interference, BA linearization errors, and weight-gradient dependence on BA residual—and introduces a simple yet effective fix: weight the flow loss using inner-loop BA-predicted weights $\Sigma_{jk}$ with stop-gradient, complemented by periodic balancing of loss contributions via $\beta$. This variance-reduction strategy yields 2–2.5x training speedups on DPVO, improves stability, and maintains or improves pose accuracy on the TartanAir and related benchmarks, with transferable gains to other BA-based pipelines like DROID-SLAM. The work highlights the broader potential of variance-aware training in optimization-augmented neural networks and suggests directions for further study of implicit-layer training dynamics and robustness to domain shift.
Abstract
Various pose estimation and tracking problems in robotics can be decomposed into a correspondence estimation problem (often computed using a deep network) followed by a weighted least squares optimization problem to solve for the poses. Recent work has shown that coupling the two problems by iteratively refining one conditioned on the other's output yields SOTA results across domains. However, training these models has proved challenging, requiring a litany of tricks to stabilize and speed up training. In this work, we take the visual odometry problem as an example and identify three plausible causes: (1) flow loss interference, (2) linearization errors in the bundle adjustment (BA) layer, and (3) dependence of weight gradients on the BA residual. We show how these issues result in noisy and higher variance gradients, potentially leading to a slow down in training and instabilities. We then propose a simple, yet effective solution to reduce the gradient variance by using the weights predicted by the network in the inner optimization loop to weight the correspondence objective in the training problem. This helps the training objective `focus' on the more important points, thereby reducing the variance and mitigating the influence of outliers. We show that the resulting method leads to faster training and can be more flexibly trained in varying training setups without sacrificing performance. In particular we show $2$--$2.5\times$ training speedups over a baseline visual odometry model we modify.
