Learning to be Smooth: An End-to-End Differentiable Particle Smoother
Ali Younis, Erik B. Sudderth
TL;DR
The paper addresses offline state estimation in vision and robotics where posteriors are multi-modal by introducing a fully differentiable two-filter particle smoother (MDPS). MDPS runs a forward and a backward discriminative particle filter (MDPF) and fuses them through importance sampling of a mixed proposal $q(x)=\tfrac{1}{2}p(x|y_{1:t-1})+\tfrac{1}{2}p(x|y_{t+1:T})$, producing a smoothed posterior $p(x_t|y_{1:T})$ with $M\ll N^2$ samples; the weights are computed via a learned posterior weight network $l(\cdot)$ and the kernel-density posterior uses bandwidths $\overleftrightarrow{\beta}$. Training proceeds in three stages, enabling end-to-end learning of dynamics and measurement models, with gradients propagated through resampling via IWSG estimators. Empirical results on bearings-only tracking and city-scale global localization (Mapillary Geo-Localization and KITTI) show that MDPS outperforms state-of-the-art differentiable particle filters and retrieval-based baselines, yielding tighter posteriors and better multi-modal recall. The work demonstrates that integrating forward and backward temporal information within a differentiable framework significantly improves robustness and accuracy for large-scale, real-world estimation tasks.
Abstract
For challenging state estimation problems arising in domains like vision and robotics, particle-based representations attractively enable temporal reasoning about multiple posterior modes. Particle smoothers offer the potential for more accurate offline data analysis by propagating information both forward and backward in time, but have classically required human-engineered dynamics and observation models. Extending recent advances in discriminative training of particle filters, we develop a framework for low-variance propagation of gradients across long time sequences when training particle smoothers. Our "two-filter'' smoother integrates particle streams that are propagated forward and backward in time, while incorporating stratification and importance weights in the resampling step to provide low-variance gradient estimates for neural network dynamics and observation models. The resulting mixture density particle smoother is substantially more accurate than state-of-the-art particle filters, as well as search-based baselines, for city-scale global vehicle localization from real-world videos and maps.
