Table of Contents
Fetching ...

The Empirical Impact of Forgetting and Transfer in Continual Visual Odometry

Paolo Cudrano, Xiaoyu Luo, Matteo Matteucci

TL;DR

This paper addresses how forgetting and transfer manifest when learning visual odometry in an embodied, lifelong setting. It empirically analyzes continual VO across 72 Habitat apartment experiences using a ResNet-based regressor and a regression loss $\\mathcal{L}$ to predict displacement $\boldsymbol{\nabla} = (\nabla_z,\nabla_x,\nabla_\theta)$ from RGB-D frames, with and without action conditioning. The study finds strong initial forward transfer followed by a specialization phase that degrades generalization, and shows that regularization strategies (e.g., EWC, LwF) do not mitigate forgetting, while rehearsal helps modestly at a memory cost; increasing model capacity does not alleviate the effect, and action information speeds learning but increases environment-specific bias. These results highlight fundamental challenges in applying off-the-shelf continual learning methods to embodied robotics and motivate the development of embodied-tailored CL approaches for long-term self-localization tasks. Overall, the work provides valuable insights into the trade-offs between adaptation and memory retention in lifelong robotics and sets benchmarks for future embodied continual learning research.

Abstract

As robotics continues to advance, the need for adaptive and continuously-learning embodied agents increases, particularly in the realm of assistance robotics. Quick adaptability and long-term information retention are essential to operate in dynamic environments typical of humans' everyday lives. A lifelong learning paradigm is thus required, but it is scarcely addressed by current robotics literature. This study empirically investigates the impact of catastrophic forgetting and the effectiveness of knowledge transfer in neural networks trained continuously in an embodied setting. We focus on the task of visual odometry, which holds primary importance for embodied agents in enabling their self-localization. We experiment on the simple continual scenario of discrete transitions between indoor locations, akin to a robot navigating different apartments. In this regime, we observe initial satisfactory performance with high transferability between environments, followed by a specialization phase where the model prioritizes current environment-specific knowledge at the expense of generalization. Conventional regularization strategies and increased model capacity prove ineffective in mitigating this phenomenon. Rehearsal is instead mildly beneficial but with the addition of a substantial memory cost. Incorporating action information, as commonly done in embodied settings, facilitates quicker convergence but exacerbates specialization, making the model overly reliant on its motion expectations and less adept at correctly interpreting visual cues. These findings emphasize the open challenges of balancing adaptation and memory retention in lifelong robotics and contribute valuable insights into the application of a lifelong paradigm on embodied agents.

The Empirical Impact of Forgetting and Transfer in Continual Visual Odometry

TL;DR

This paper addresses how forgetting and transfer manifest when learning visual odometry in an embodied, lifelong setting. It empirically analyzes continual VO across 72 Habitat apartment experiences using a ResNet-based regressor and a regression loss to predict displacement from RGB-D frames, with and without action conditioning. The study finds strong initial forward transfer followed by a specialization phase that degrades generalization, and shows that regularization strategies (e.g., EWC, LwF) do not mitigate forgetting, while rehearsal helps modestly at a memory cost; increasing model capacity does not alleviate the effect, and action information speeds learning but increases environment-specific bias. These results highlight fundamental challenges in applying off-the-shelf continual learning methods to embodied robotics and motivate the development of embodied-tailored CL approaches for long-term self-localization tasks. Overall, the work provides valuable insights into the trade-offs between adaptation and memory retention in lifelong robotics and sets benchmarks for future embodied continual learning research.

Abstract

As robotics continues to advance, the need for adaptive and continuously-learning embodied agents increases, particularly in the realm of assistance robotics. Quick adaptability and long-term information retention are essential to operate in dynamic environments typical of humans' everyday lives. A lifelong learning paradigm is thus required, but it is scarcely addressed by current robotics literature. This study empirically investigates the impact of catastrophic forgetting and the effectiveness of knowledge transfer in neural networks trained continuously in an embodied setting. We focus on the task of visual odometry, which holds primary importance for embodied agents in enabling their self-localization. We experiment on the simple continual scenario of discrete transitions between indoor locations, akin to a robot navigating different apartments. In this regime, we observe initial satisfactory performance with high transferability between environments, followed by a specialization phase where the model prioritizes current environment-specific knowledge at the expense of generalization. Conventional regularization strategies and increased model capacity prove ineffective in mitigating this phenomenon. Rehearsal is instead mildly beneficial but with the addition of a substantial memory cost. Incorporating action information, as commonly done in embodied settings, facilitates quicker convergence but exacerbates specialization, making the model overly reliant on its motion expectations and less adept at correctly interpreting visual cues. These findings emphasize the open challenges of balancing adaptation and memory retention in lifelong robotics and contribute valuable insights into the application of a lifelong paradigm on embodied agents.
Paper Structure (19 sections, 4 equations, 23 figures, 5 tables)

This paper contains 19 sections, 4 equations, 23 figures, 5 tables.

Figures (23)

  • Figure 1: Visual odometry (VO). A mobile agent equipped with a camera is in motion in its environment while observing the scene. As the agent is subject to noisy actuation, the actual motion of the vehicle is not known at high precision after a motion action $a_t$ is performed. Visual odometry focuses on estimating the actual displacement $(\widehat{\Delta_z}, \widehat{\Delta_x}, \widehat{\Delta_{\theta}})$ registered by the agent in a short time step $t+1$, using the information acquired via cameras at each time step ($I_t, I_t+1$).
  • Figure 2: Average test loss across all experiences when performing naive finetuning, i.e., training on the sequence of apartments continually. We compare against a joint training on all apartments, noticing a large gap in the converged performance of the two methods (3.88e-3 against 0.5e-3).
  • Figure 3: Progression during lifelong training of the loss reached on the current apartment against the loss scored on past or future apartments. The graph highlights how, after an initial phase, the network improves only on the apartment it is currently visiting, while past and future experiences remain flat.
  • Figure 4: Comparison of each loss component over current, past, and future experiences.
  • Figure 5: Comparison of continual learning metrics: backward transfer, forgetting ratio, and forward transfer.
  • ...and 18 more figures