TTA-Nav: Test-time Adaptive Reconstruction for Point-Goal Navigation under Visual Corruptions

Maytus Piriyajitakonkij; Mingfei Sun; Mengmi Zhang; Wei Pan

TTA-Nav: Test-time Adaptive Reconstruction for Point-Goal Navigation under Visual Corruptions

Maytus Piriyajitakonkij, Mingfei Sun, Mengmi Zhang, Wei Pan

TL;DR

The TTA-Nav method improves the success rate of point-goal navigation from the state-of-the-art result of 46% to 94% on the most severe corruption, suggesting its potential for broader application in robotic visual navigation.

Abstract

Robot navigation under visual corruption presents a formidable challenge. To address this, we propose a Test-time Adaptation (TTA) method, named as TTA-Nav, for point-goal navigation under visual corruptions. Our "plug-and-play" method incorporates a top-down decoder to a pre-trained navigation model. Firstly, the pre-trained navigation model gets a corrupted image and extracts features. Secondly, the top-down decoder produces the reconstruction given the high-level features extracted by the pre-trained model. Then, it feeds the reconstruction of a corrupted image back to the pre-trained model. Finally, the pre-trained model does forward pass again to output action. Despite being trained solely on clean images, the top-down decoder can reconstruct cleaner images from corrupted ones without the need for gradient-based adaptation. The pre-trained navigation model with our top-down decoder significantly enhances navigation performance across almost all visual corruptions in our benchmarks. Our method improves the success rate of point-goal navigation from the state-of-the-art result of 46% to 94% on the most severe corruption. This suggests its potential for broader application in robotic visual navigation. Project page: https://sites.google.com/view/tta-nav

TTA-Nav: Test-time Adaptive Reconstruction for Point-Goal Navigation under Visual Corruptions

TL;DR

Abstract

Paper Structure (16 sections, 2 equations, 5 figures, 2 tables, 2 algorithms)

This paper contains 16 sections, 2 equations, 5 figures, 2 tables, 2 algorithms.

Introduction
Related work
Visual Navigation
Domain Generalization (DG)
Domain adaptation (DA)
Test-time adaptation (TTA)
Methods
Point-Goal Navigation Problem Formulation
End-to-End Visual Navigation Models
Algorithm
Experiments
Visual Corruptions
Baselines
Evaluation Metrics
Results
...and 1 more sections

Figures (5)

Figure 1: Point-Goal Navigation Under Visual Corruptions: A robot is tasked to move from start to goal position. Top: The robot is equipped with the state-of-the-art navigation method wijmans2019ddppo. It fails to navigate from the bedroom to another room when facing the dimmed light condition. Bottom: The robot is equipped with our method. It receives the reconstructed image as the "surrogate observation" instead of the real observation from the scene.
Figure 2: An overview of our proposed method (TTA-Nav): We present the pretrained navigation model (Pretrained-Nav), which contains Visual Encoder (VE) and policy network, in the purple frame. TTA-Nav is a plug-and-play method. It has Top-down Decoder (TD) in the orange frame. TD receives the output of the late layer of VE and projects the reconstructed image back to the VE's input layer. Here, VE is SE-ResNeXt-50 hu2018squeeze and TD receives block3's output. Left: During training, VE remains fixed, while TD is trained to predict VE's input by minimizing Mean Squared Error (MSE) loss. Training occurs offline, utilizing samples from Replay, which contains the visual experiences of a robot navigating 72 Gibson training scenes. Right: During testing, TD is frozen and the BatchNorm layers of VE can adjust their normalization statistics $\hat{\mu}_{k}$ and $\hat{\sigma}_{k}$. See \ref{['sec:methods']} for more details. Finally, the reconstructed image is input to VE for navigation. VE does feedforward computation again and the policy outputs actions.
Figure 3: Visual Corruptions: Two consecutive images of the same corruption type are in temporal order, as indicated by the arrow. Every corruption is induced frame by frame, so the corruption in the frame $t$ is independent to the corruption in the frame $t+1$. For instance, at the bottom right, the position and color of a box at the time $t$ is independent to the position and color of a box at the time $t+1$ in Occlusion type.
Figure 4: Reconstructions from Top-down Decoder (TD): The left columns are the inputs of Visual Encoder (VE). No Adapt: Images in this column are the reconstructions from VE's inputs without adaptation. Adapt: Images in this column are reconstructions from the same VE's inputs with adaptation. The quality of reconstructed images matters in navigation. For example, bad reconstructions, shown in row 12 (Fog) and row 13 (Shadow) in \ref{['table:performance']}, may degrade navigation performance compared to non-adaptive approaches.
Figure 5: Navigation Behavior: The figure shows examples of robot behavior in the same setting as used in Fig. \ref{['fig:intro']}. This navigation route is one of the most difficult routes in the scene ranked by the distance between start and goal. * denotes the severe corruptionsevere. No Adaptation refers to the non-adaptive state-of-the-art navigation model, DD-PPO. Our Method refers to TTA-Nav.

Theorems & Definitions (2)

Remark 1
Remark 2

TTA-Nav: Test-time Adaptive Reconstruction for Point-Goal Navigation under Visual Corruptions

TL;DR

Abstract

TTA-Nav: Test-time Adaptive Reconstruction for Point-Goal Navigation under Visual Corruptions

Authors

TL;DR

Abstract

Table of Contents

Figures (5)

Theorems & Definitions (2)