Shedding Light on VLN Robustness: A Black-box Framework for Indoor Lighting-based Adversarial Attack
Chenyang Li, Wenbing Tang, Yihao Huang, Sinong Simon Zhan, Ming Hu, Xiaojun Jia, Yang Liu
TL;DR
The paper tackles robustness gaps in Vision-and-Language Navigation (VLN) by examining realistic indoor perturbations and introduces Indoor Lighting-based Adversarial Attack (ILA), a black-box framework with two modes: Static Indoor Lighting-based Attack (SILA) and Dynamic Indoor Lighting-based Attack (DILA). SILA searches for a global lighting intensity that degrades navigation via a trajectory-weighted loss $\mathcal{J}_{\text{static}} = \sum_{t=1}^{\hat{T}} w_t \|\operatorname{pos}(s_t) - G\|_2$ with $w_t = \frac{t}{\hat{T}}$, while DILA uses a one-step lookahead surrogate to trigger on/off switches of light at critical decision points based on heading deviation. The black-box optimization updates the intensity using two candidate values $l^{k+}$ and $l^{k-}$ with $\xi^k = \operatorname{sign}(\mathcal{J}(\mathcal{L}^{k+}) - \mathcal{J}(\mathcal{L}^{k-}))$ and $\Delta l \leftarrow \operatorname{clip}(\Delta l + \alpha \cdot b^k \cdot \xi^k)$, and the dynamic mode switches are decided by $\beta_{t} = \arccos\left( \frac{\vec{v}_1 \cdot \vec{v}_2}{\|\vec{v}_1\|_2 \|\vec{v}_2\|_2} \right)$ with a positive delta indicating disruption. Evaluations on SPOC and FLaRe across ObjectNav, Fetch, and RoomVisit show substantial reductions in navigation success and increased episode length under attack, highlighting indoor lighting as a practical robustness evaluation dimension. The results motivate future work on illumination-aware training and defense mechanisms for safer embodied AI systems.
Abstract
Vision-and-Language Navigation (VLN) agents have made remarkable progress, but their robustness remains insufficiently studied. Existing adversarial evaluations often rely on perturbations that manifest as unusual textures rarely encountered in everyday indoor environments. Errors under such contrived conditions have limited practical relevance, as real-world agents are unlikely to encounter such artificial patterns. In this work, we focus on indoor lighting, an intrinsic yet largely overlooked scene attribute that strongly influences navigation. We propose Indoor Lighting-based Adversarial Attack (ILA), a black-box framework that manipulates global illumination to disrupt VLN agents. Motivated by typical household lighting usage, we design two attack modes: Static Indoor Lighting-based Attack (SILA), where the lighting intensity remains constant throughout an episode, and Dynamic Indoor Lighting-based Attack (DILA), where lights are switched on or off at critical moments to induce abrupt illumination changes. We evaluate ILA on two state-of-the-art VLN models across three navigation tasks. Results show that ILA significantly increases failure rates while reducing trajectory efficiency, revealing previously unrecognized vulnerabilities of VLN agents to realistic indoor lighting variations.
