Parallel-in-Time Kalman Smoothing Using Orthogonal Transformations
Shahaf Gargir, Sivan Toledo
TL;DR
The paper tackles the sequential bottleneck of Kalman smoothing by introducing a numerically-stable parallel-in-time smoother based on a specialized sparse QR factorization with an odd-even block permutation. Covariance information is recovered efficiently through a SelInv-based adaptation, enabling diagonal blocks of (R^T R)^{-1} to be computed in parallel. Implemented in C/C++ with Threading Building Blocks, the Odd-Even smoother scales well on multi-core servers, delivering up to 47x speedups on 64 cores, and generally outperforms the prior parallel-in-time approach by Särkkä and García-Fernández while maintaining numerical stability and flexibility (e.g., handling non-identity $H_i$ and unknown initial-state expectations). The work highlights the trade-offs between parallelism and arithmetic overhead, demonstrates practical performance, and provides open-source access to the implementation for further adoption in high-dimensional Kalman smoothing tasks.
Abstract
We present a numerically-stable parallel-in-time linear Kalman smoother. The smoother uses a novel highly-parallel QR factorization for a class of structured sparse matrices for state estimation, and an adaptation of the SelInv selective-inversion algorithm to evaluate the covariance matrices of estimated states. Our implementation of the new algorithm, using the Threading Building Blocks (TBB) library, scales well on both Intel and ARM multi-core servers, achieving speedups of up to 47x on 64 cores. The algorithm performs more arithmetic than sequential smoothers; consequently it is 1.8x to 2.5x slower on a single core. The new algorithm is faster and scales better than the parallel Kalman smoother proposed by Särkkä and García-Fernández in 2021.
