Table of Contents
Fetching ...

Learning to be Smooth: An End-to-End Differentiable Particle Smoother

Ali Younis, Erik B. Sudderth

TL;DR

The paper addresses offline state estimation in vision and robotics where posteriors are multi-modal by introducing a fully differentiable two-filter particle smoother (MDPS). MDPS runs a forward and a backward discriminative particle filter (MDPF) and fuses them through importance sampling of a mixed proposal $q(x)=\tfrac{1}{2}p(x|y_{1:t-1})+\tfrac{1}{2}p(x|y_{t+1:T})$, producing a smoothed posterior $p(x_t|y_{1:T})$ with $M\ll N^2$ samples; the weights are computed via a learned posterior weight network $l(\cdot)$ and the kernel-density posterior uses bandwidths $\overleftrightarrow{\beta}$. Training proceeds in three stages, enabling end-to-end learning of dynamics and measurement models, with gradients propagated through resampling via IWSG estimators. Empirical results on bearings-only tracking and city-scale global localization (Mapillary Geo-Localization and KITTI) show that MDPS outperforms state-of-the-art differentiable particle filters and retrieval-based baselines, yielding tighter posteriors and better multi-modal recall. The work demonstrates that integrating forward and backward temporal information within a differentiable framework significantly improves robustness and accuracy for large-scale, real-world estimation tasks.

Abstract

For challenging state estimation problems arising in domains like vision and robotics, particle-based representations attractively enable temporal reasoning about multiple posterior modes. Particle smoothers offer the potential for more accurate offline data analysis by propagating information both forward and backward in time, but have classically required human-engineered dynamics and observation models. Extending recent advances in discriminative training of particle filters, we develop a framework for low-variance propagation of gradients across long time sequences when training particle smoothers. Our "two-filter'' smoother integrates particle streams that are propagated forward and backward in time, while incorporating stratification and importance weights in the resampling step to provide low-variance gradient estimates for neural network dynamics and observation models. The resulting mixture density particle smoother is substantially more accurate than state-of-the-art particle filters, as well as search-based baselines, for city-scale global vehicle localization from real-world videos and maps.

Learning to be Smooth: An End-to-End Differentiable Particle Smoother

TL;DR

The paper addresses offline state estimation in vision and robotics where posteriors are multi-modal by introducing a fully differentiable two-filter particle smoother (MDPS). MDPS runs a forward and a backward discriminative particle filter (MDPF) and fuses them through importance sampling of a mixed proposal , producing a smoothed posterior with samples; the weights are computed via a learned posterior weight network and the kernel-density posterior uses bandwidths . Training proceeds in three stages, enabling end-to-end learning of dynamics and measurement models, with gradients propagated through resampling via IWSG estimators. Empirical results on bearings-only tracking and city-scale global localization (Mapillary Geo-Localization and KITTI) show that MDPS outperforms state-of-the-art differentiable particle filters and retrieval-based baselines, yielding tighter posteriors and better multi-modal recall. The work demonstrates that integrating forward and backward temporal information within a differentiable framework significantly improves robustness and accuracy for large-scale, real-world estimation tasks.

Abstract

For challenging state estimation problems arising in domains like vision and robotics, particle-based representations attractively enable temporal reasoning about multiple posterior modes. Particle smoothers offer the potential for more accurate offline data analysis by propagating information both forward and backward in time, but have classically required human-engineered dynamics and observation models. Extending recent advances in discriminative training of particle filters, we develop a framework for low-variance propagation of gradients across long time sequences when training particle smoothers. Our "two-filter'' smoother integrates particle streams that are propagated forward and backward in time, while incorporating stratification and importance weights in the resampling step to provide low-variance gradient estimates for neural network dynamics and observation models. The resulting mixture density particle smoother is substantially more accurate than state-of-the-art particle filters, as well as search-based baselines, for city-scale global vehicle localization from real-world videos and maps.

Paper Structure

This paper contains 29 sections, 26 equations, 20 figures, 3 tables.

Figures (20)

  • Figure 1: Left: Our MDPS method showing the forward and backward particle filters, which are integrated (via learned neural networks, indicated by trapezoids) to produce a smoothed mixture posterior. Right: Feature encoders and measurement model used for global localization. First-person camera views are encoded into a Birds-Eye-View (BEV) feature map by extracting features before applying a geometric projection sarlin2023orienternet. Map features are extracted via a feed-forward encoder, and un-normalized particle weights are computed as an inner product between BEV features and features of a local map extracted from the global map at the particle location.
  • Figure 2: Box plots showing median (red line), quartiles (blue box), and range (whiskers) over 11 training runs for Bearings-Only tracking (Sec. \ref{['sec:bearings_only_task']}). We boost the robustness of the top-performing MDPF younis2023mdpf, which previously used multinomial resampling, by incorporating variance-reduced stratified resampling; residual resampling is both slower and less effective. Stratified resampling provides larger advantages for the less-sophisticated TG-PF jonschkowski18_differentiable_particle_filter and SR-PF pmlr-v87-karkus18a_soft_resampling gradient estimators, but these baselines remain inferior to MDPF. Our MDPS substantially improves on all PFs by incorporating both past and future observations when computing posteriors. Classic FFBS particle smoothers doucet2009tutorialKlaas2006FastPS have poor performance, even when provided the true likelihoods (rather than a learned approximation), showing the effectiveness of our end-to-end learning of particle proposals and weights. Forward PFs are initialized with noisy samples of the true state, while MDPF-Backward (the backwards-time PF component of MDPS) is initialized by sampling uniformly from the state space.
  • Figure 3: Position and error recall using the MGL sarlin2023orienternet dataset. Recall is computed with the top posterior mode as well as with the best of the top-3 posterior modes, extracted via non-maximal suppression. As expected, Retrieval noe2020eccv methods do poorly due to their lack of discrimination power between neighboring map patches. Dense search sarlin2023orienternet does better by using fine map details during localization, but it requires a ground truth hint ("Cheating" with GT, which artificially improves performance) to work well at city-scale environments. Retrieval (PF) 9635972GausePF uses unlearned state dynamics, which proves useful, but still suffers from the poor discriminative ability of retrieval. In contrast, MDPF younis2023mdpf uses end-to-end learned dynamics and measurement models, allowing for good performance but suffering from only using past information when estimating posterior densities. Our MDPS is able to learn similar strong dynamics and measurement models as MDPF, and also incorporates future as well as past information to achieve a more accurate posterior density and thus higher recall.
  • Figure 4: Example trajectories from the MGL dataset with observations shown in the top row. We show the current true state and state history (black arrow and black line), the estimated posterior density of the current state (red cloud, with darker being higher probability) and the top 3 extracted modes (blue arrows) for the MDPS as well as its forward and backward MDPFs. Due to ambiguity at early time-steps, MDPF younis2023mdpf is unable to resolve the correct intersection, and instead places probability mass at multiple intersections. By fusing both forward and backward filters, our MDPS resolves this ambiguity with probability mass focused on the correct intersection. Furthermore, MDPS provides a tighter posterior density than either MDPF-Forward or MDPF-Backward.
  • Figure 5: Learned dynamics from the forward filter of MDPS trained on the MGL dataset. Density cloud illustrates density of particles after applying dynamics while marginalizing actions. MDPS clearly learns informative, non-linear dynamics models which aid in state posterior estimation.
  • ...and 15 more figures