Mitigating Error Accumulation in Continuous Navigation via Memory-Augmented Kalman Filtering
Yin Tang, Jiawei Ma, Jinrui Zhang, Alex Jinpeng Wang, Deyu Zhang
TL;DR
This work tackles state drift in continuous UAV navigation due to dead-reckoning in Vision-Language Navigation by reinterpreting navigation as Recursive Bayesian State Estimation. It introduces NeuroKalman, a two-stream framework that decouples a predictive prior (GRU-based motion dynamics) from a memory-guided likelihood update, where memory retrieval is formulated as Kernel Density Estimation over attention-derived anchors. A learnable Kalman gain fuses the prior and the corrected measurement to produce a drift-controlling posterior, enabling robust, data-efficient navigation with limited fine-tuning. Experiments on TravelUAV demonstrate improved trajectory accuracy, strong generalization to unseen scenes/objects, and explicit mitigation of error accumulation, highlighting the practical impact for GPS-denied UAV operation and long-horizon planning. The approach provides a principled, memory-augmented alternative to purely parametric models, with implications for safer, more reliable autonomous navigation in complex environments.
Abstract
Continuous navigation in complex environments is critical for Unmanned Aerial Vehicle (UAV). However, the existing Vision-Language Navigation (VLN) models follow the dead-reckoning, which iteratively updates its position for the next waypoint prediction, and subsequently construct the complete trajectory. Then, such stepwise manner will inevitably lead to accumulated errors of position over time, resulting in misalignment between internal belief and objective coordinates, which is known as "state drift" and ultimately compromises the full trajectory prediction. Drawing inspiration from classical control theory, we propose to correct for errors by formulating such sequential prediction as a recursive Bayesian state estimation problem. In this paper, we design NeuroKalman, a novel framework that decouples navigation into two complementary processes: a Prior Prediction, based on motion dynamics and a Likelihood Correction, from historical observation. We first mathematically associate Kernel Density Estimation of the measurement likelihood with the attention-based retrieval mechanism, which then allows the system to rectify the latent representation using retrieved historical anchors without gradient updates. Comprehensive experiments on TravelUAV benchmark demonstrate that, with only 10% of the training data fine-tuning, our method clearly outperforms strong baselines and regulates drift accumulation.
