Table of Contents
Fetching ...

Mitigating Error Accumulation in Continuous Navigation via Memory-Augmented Kalman Filtering

Yin Tang, Jiawei Ma, Jinrui Zhang, Alex Jinpeng Wang, Deyu Zhang

TL;DR

This work tackles state drift in continuous UAV navigation due to dead-reckoning in Vision-Language Navigation by reinterpreting navigation as Recursive Bayesian State Estimation. It introduces NeuroKalman, a two-stream framework that decouples a predictive prior (GRU-based motion dynamics) from a memory-guided likelihood update, where memory retrieval is formulated as Kernel Density Estimation over attention-derived anchors. A learnable Kalman gain fuses the prior and the corrected measurement to produce a drift-controlling posterior, enabling robust, data-efficient navigation with limited fine-tuning. Experiments on TravelUAV demonstrate improved trajectory accuracy, strong generalization to unseen scenes/objects, and explicit mitigation of error accumulation, highlighting the practical impact for GPS-denied UAV operation and long-horizon planning. The approach provides a principled, memory-augmented alternative to purely parametric models, with implications for safer, more reliable autonomous navigation in complex environments.

Abstract

Continuous navigation in complex environments is critical for Unmanned Aerial Vehicle (UAV). However, the existing Vision-Language Navigation (VLN) models follow the dead-reckoning, which iteratively updates its position for the next waypoint prediction, and subsequently construct the complete trajectory. Then, such stepwise manner will inevitably lead to accumulated errors of position over time, resulting in misalignment between internal belief and objective coordinates, which is known as "state drift" and ultimately compromises the full trajectory prediction. Drawing inspiration from classical control theory, we propose to correct for errors by formulating such sequential prediction as a recursive Bayesian state estimation problem. In this paper, we design NeuroKalman, a novel framework that decouples navigation into two complementary processes: a Prior Prediction, based on motion dynamics and a Likelihood Correction, from historical observation. We first mathematically associate Kernel Density Estimation of the measurement likelihood with the attention-based retrieval mechanism, which then allows the system to rectify the latent representation using retrieved historical anchors without gradient updates. Comprehensive experiments on TravelUAV benchmark demonstrate that, with only 10% of the training data fine-tuning, our method clearly outperforms strong baselines and regulates drift accumulation.

Mitigating Error Accumulation in Continuous Navigation via Memory-Augmented Kalman Filtering

TL;DR

This work tackles state drift in continuous UAV navigation due to dead-reckoning in Vision-Language Navigation by reinterpreting navigation as Recursive Bayesian State Estimation. It introduces NeuroKalman, a two-stream framework that decouples a predictive prior (GRU-based motion dynamics) from a memory-guided likelihood update, where memory retrieval is formulated as Kernel Density Estimation over attention-derived anchors. A learnable Kalman gain fuses the prior and the corrected measurement to produce a drift-controlling posterior, enabling robust, data-efficient navigation with limited fine-tuning. Experiments on TravelUAV demonstrate improved trajectory accuracy, strong generalization to unseen scenes/objects, and explicit mitigation of error accumulation, highlighting the practical impact for GPS-denied UAV operation and long-horizon planning. The approach provides a principled, memory-augmented alternative to purely parametric models, with implications for safer, more reliable autonomous navigation in complex environments.

Abstract

Continuous navigation in complex environments is critical for Unmanned Aerial Vehicle (UAV). However, the existing Vision-Language Navigation (VLN) models follow the dead-reckoning, which iteratively updates its position for the next waypoint prediction, and subsequently construct the complete trajectory. Then, such stepwise manner will inevitably lead to accumulated errors of position over time, resulting in misalignment between internal belief and objective coordinates, which is known as "state drift" and ultimately compromises the full trajectory prediction. Drawing inspiration from classical control theory, we propose to correct for errors by formulating such sequential prediction as a recursive Bayesian state estimation problem. In this paper, we design NeuroKalman, a novel framework that decouples navigation into two complementary processes: a Prior Prediction, based on motion dynamics and a Likelihood Correction, from historical observation. We first mathematically associate Kernel Density Estimation of the measurement likelihood with the attention-based retrieval mechanism, which then allows the system to rectify the latent representation using retrieved historical anchors without gradient updates. Comprehensive experiments on TravelUAV benchmark demonstrate that, with only 10% of the training data fine-tuning, our method clearly outperforms strong baselines and regulates drift accumulation.
Paper Structure (19 sections, 12 equations, 6 figures, 6 tables)

This paper contains 19 sections, 12 equations, 6 figures, 6 tables.

Figures (6)

  • Figure 1: Illustration of state drift mitigation. Given a global instruction, existing models ignore the history but make prediction only from current inputs, and thus suffer from accumulated error and state drift to collision (orange line). Instead, our NeuroKalman framework introduces a Kalman correction mechanism by fusing historic measurement as anchors for prediction to rectify the trajectory prediction (blue line).
  • Figure 2: NeuroKalman framework aims to leverage temporal context to enhance next step prediction in navigation. Specifically, we follow the logic in classic Kalman filtering sarkka2023bayesian, and consider the Prediction and Update steps kalman1960new, i.e., the former one makes initial estimation while the latter one estimates measurement representation $\mathbf{r}_t$ for core Kalman correction. In detail, the Prediction Block employs a GRU to roughly model the motion dynamics to predict the prior state $\tilde{\mathbf{z}}_t$ with updated hidden state $\mathbf{h}_t$, according to the posterior state $\mathbf{z}_{t-1}$ in the last step. Then, with the confidence scalar $\sigma_t$ predicted by the Update Block, the Kalman Gain $K_t$ is estimated on the representation space for correction. The waypoint prediction $\phi(\mathbf{z}_t)$ is omitted for the clarity of illustration while the variables $\mathbf{r}_t$, $\tilde{\mathbf{z}}_t$ can be both fed in $\phi(\cdot)$ for augmented supervision.
  • Figure 3: Demonstration of trajectory rectification. The TravelUAV-FT relies solely on parametric predictions to estimate its trajectory, resulting in obvious trajectory drift. NeuroKalman rectifies its position by integrating Kalman correction.
  • Figure 4: Visualization of $L_2$ position error over time. The baselines (orange and red dashed lines) show a continuous error increase on long trajectories. Conversely, NeuroKalman (blue solid line) keeps the error stable and prevents it from growing rapidly via effective Kalman correction.
  • Figure 5: Navigation example comparison between the TravelUAV-FT and our NeuroKalman (Top-Down View). Due to severe state drift, TravelUAV-FT fails to recognize key landmarks and loses its orientation, resulting in a failed search. In contrast, NeuroKalman successfully anchors its position against structural features, maintaining the correct heading towards the target.
  • ...and 1 more figures