Towards Deviation-Robust Agent Navigation via Perturbation-Aware Contrastive Learning

Bingqian Lin; Yanxin Long; Yi Zhu; Fengda Zhu; Xiaodan Liang; Qixiang Ye; Liang Lin

Towards Deviation-Robust Agent Navigation via Perturbation-Aware Contrastive Learning

Bingqian Lin, Yanxin Long, Yi Zhu, Fengda Zhu, Xiaodan Liang, Qixiang Ye, Liang Lin

TL;DR

The paper tackles vision-and-language navigation under real-world disturbances by introducing PROPER, a training framework that induces deviation-robust behavior through progressively perturbed trajectories and a perturbation-aware contrastive objective. It employs an edge-deletion perturbation to simulate route deviations, progressively augments training data, and uses InfoNCE-based losses to differentiate perturbation-free and perturbation-based trajectory encodings. PROPER is model-agnostic and validated on the Room-to-Room dataset and a Path-Perturbed PP-R2R subset, showing improved perturbation robustness across multiple strong VLN baselines and generalization to CVDN. The results suggest that exposure to perturbations during training improves both perturbation-free and robust navigation, with practical implications for real-world robot assistants and embodied agents.

Abstract

Vision-and-language navigation (VLN) asks an agent to follow a given language instruction to navigate through a real 3D environment. Despite significant advances, conventional VLN agents are trained typically under disturbance-free environments and may easily fail in real-world scenarios, since they are unaware of how to deal with various possible disturbances, such as sudden obstacles or human interruptions, which widely exist and may usually cause an unexpected route deviation. In this paper, we present a model-agnostic training paradigm, called Progressive Perturbation-aware Contrastive Learning (PROPER) to enhance the generalization ability of existing VLN agents, by requiring them to learn towards deviation-robust navigation. Specifically, a simple yet effective path perturbation scheme is introduced to implement the route deviation, with which the agent is required to still navigate successfully following the original instruction. Since directly enforcing the agent to learn perturbed trajectories may lead to inefficient training, a progressively perturbed trajectory augmentation strategy is designed, where the agent can self-adaptively learn to navigate under perturbation with the improvement of its navigation performance for each specific trajectory. For encouraging the agent to well capture the difference brought by perturbation, a perturbation-aware contrastive learning mechanism is further developed by contrasting perturbation-free trajectory encodings and perturbation-based counterparts. Extensive experiments on R2R show that PROPER can benefit multiple VLN baselines in perturbation-free scenarios. We further collect the perturbed path data to construct an introspection subset based on the R2R, called Path-Perturbed R2R (PP-R2R). The results on PP-R2R show unsatisfying robustness of popular VLN agents and the capability of PROPER in improving the navigation robustness.

Towards Deviation-Robust Agent Navigation via Perturbation-Aware Contrastive Learning

TL;DR

Abstract

Paper Structure (23 sections, 8 equations, 9 figures, 9 tables, 2 algorithms)

This paper contains 23 sections, 8 equations, 9 figures, 9 tables, 2 algorithms.

Introduction
Related Work
Vision-and-Language Navigation.
Navigation Robustness.
Contrastive Learning.
Method
Problem Setup
Perturbed Trajectory Construction
Progressively Perturbed Trajectory Augmentation
Perturbation-aware Contrastive Learning
Experiments
Experimental Setup
Datasets
Evaluation Metrics
Baselines
...and 8 more sections

Figures (9)

Figure 1: In real-world scenarios, a VLN agent required to navigate from the start position $s$ to the goal position may fail to move to $o$ from $c$ in the blue ground-truth (GT) trajectory due to a wrong action decision or possible disturbances and thus leads to a route deviation. The red and green trajectories represent a failed and successful trajectory under deviation, respectively.
Figure 2: The overview of PROPER. (a) Progressively perturbed trajectory augmentation. At each training iteration, new GT matched trajectories are collected and imposed with perturbation. Then the new perturbed trajectories are combined with previous perturbed trajectories for training. (b) Perturbation-aware Contrastive Learning. In perturbation-free and perturbation-based scenes, the anchor, positive and negative samples are obtained by the trajectory encoder $E_{T}$ for calculating the contrastive learning loss $\mathcal{L}_{f}$ and $\mathcal{L}_{p}$, respectively.
Figure 3: The flowchart of navigating under perturbation. At timestep $t$, the agent outputs the hidden state $h_{t}$ and the action $a_{t}$ based on the given instruction, current panoramic view, and previous hidden state $h_{t-1}$. If a perturbation is conducted, the agent will make alternative action $a'_{t}$.
Figure 4: The illustration of the perturbation-aware GT trajectory construction. When the edge $(c_{t},c_{t+1})$ in the GT trajectory $p_{n}$ is perturbed, the perturbation-aware GT trajectory $p_{n}^{obs}$ is constructed by connecting the sub-paths $(s_{n},c_{t})$, $(c_{t},m)$, and $(m,d_{n})$. $(s_{n},c_{t})$ and $(m,d_{n})$ overlap with the original $p_{n}$. $m$ is the end point of the shortest path whose beginning point is $c_{t}$ and the end point is on the sub-path $(c_{t+1},d_{n})$.
Figure 5: The illustration of perturbation-aware contrastive learning under different scenarios. For perturbation-free trajectory encoding $\boldsymbol{e}_{f}$, the positive sample and the intra-negative sample are GT trajectory encoding $\boldsymbol{e}_{g}$ and perturbation-aware GT trajectory encoding $\boldsymbol{e}_{og}$, respectively. For perturbation-based trajectory encoding $\boldsymbol{e}_{p}$, the positive sample and the intra-negative sample are perturbation-aware GT trajectory encoding $\boldsymbol{e}_{og}$ and the perturbation-aware GT trajectory encoding of different perturbation position $\boldsymbol{e}_{og'}$, respectively. For simplicity, we omit the inter-negative samples $\boldsymbol{e}_{m}$ in both two scenarios in the figure, which are other trajectories in the same minibatch.
...and 4 more figures

Towards Deviation-Robust Agent Navigation via Perturbation-Aware Contrastive Learning

TL;DR

Abstract

Towards Deviation-Robust Agent Navigation via Perturbation-Aware Contrastive Learning

Authors

TL;DR

Abstract

Table of Contents

Figures (9)