The Empirical Impact of Forgetting and Transfer in Continual Visual Odometry
Paolo Cudrano, Xiaoyu Luo, Matteo Matteucci
TL;DR
This paper addresses how forgetting and transfer manifest when learning visual odometry in an embodied, lifelong setting. It empirically analyzes continual VO across 72 Habitat apartment experiences using a ResNet-based regressor and a regression loss $\\mathcal{L}$ to predict displacement $\boldsymbol{\nabla} = (\nabla_z,\nabla_x,\nabla_\theta)$ from RGB-D frames, with and without action conditioning. The study finds strong initial forward transfer followed by a specialization phase that degrades generalization, and shows that regularization strategies (e.g., EWC, LwF) do not mitigate forgetting, while rehearsal helps modestly at a memory cost; increasing model capacity does not alleviate the effect, and action information speeds learning but increases environment-specific bias. These results highlight fundamental challenges in applying off-the-shelf continual learning methods to embodied robotics and motivate the development of embodied-tailored CL approaches for long-term self-localization tasks. Overall, the work provides valuable insights into the trade-offs between adaptation and memory retention in lifelong robotics and sets benchmarks for future embodied continual learning research.
Abstract
As robotics continues to advance, the need for adaptive and continuously-learning embodied agents increases, particularly in the realm of assistance robotics. Quick adaptability and long-term information retention are essential to operate in dynamic environments typical of humans' everyday lives. A lifelong learning paradigm is thus required, but it is scarcely addressed by current robotics literature. This study empirically investigates the impact of catastrophic forgetting and the effectiveness of knowledge transfer in neural networks trained continuously in an embodied setting. We focus on the task of visual odometry, which holds primary importance for embodied agents in enabling their self-localization. We experiment on the simple continual scenario of discrete transitions between indoor locations, akin to a robot navigating different apartments. In this regime, we observe initial satisfactory performance with high transferability between environments, followed by a specialization phase where the model prioritizes current environment-specific knowledge at the expense of generalization. Conventional regularization strategies and increased model capacity prove ineffective in mitigating this phenomenon. Rehearsal is instead mildly beneficial but with the addition of a substantial memory cost. Incorporating action information, as commonly done in embodied settings, facilitates quicker convergence but exacerbates specialization, making the model overly reliant on its motion expectations and less adept at correctly interpreting visual cues. These findings emphasize the open challenges of balancing adaptation and memory retention in lifelong robotics and contribute valuable insights into the application of a lifelong paradigm on embodied agents.
