Non-submodular Visual Attention for Robot Navigation

Reza Vafaee; Kian Behzad; Milad Siami; Luca Carlone; Ali Jadbabaie

Non-submodular Visual Attention for Robot Navigation

Reza Vafaee, Kian Behzad, Milad Siami, Luca Carlone, Ali Jadbabaie

TL;DR

The paper tackles task-aware feature selection for Visual-Inertial Navigation under tight computational constraints by formulating a non-submodular MSE objective and solving it with four polynomial-time approximations. It couples a forward-simulation (anticipation) model with IMU and vision models to define feature contributions $\Delta_l$ and the information matrix $\Omega_{\mathsf{S}} = \Omega_{\emptyset} + \sum_{l\in\mathsf{S}} \Delta_l$, targeting minimization of $\mathrm{Trace}(\Omega_{\mathsf{S}}^{-1})$. The authors derive performance guarantees using submodularity ratio $\gamma$ and curvature $\alpha$, and develop a fast low-rank greedy with SMW updates, a randomized greedy, and a linearization-based greedy via a Taylor surrogate, each with concrete computational benefits. Extensive experiments on EuRoC and a control-enabled QCar platform validate the theoretical results, showing near-optimality for greedy variants, real-time feasibility for linearized greedy, and robust performance under practical sensing and control dynamics. The work offers practical guidance for selecting informative visual features in VIN to balance accuracy and on-board computation, with potential extensions to multi-agent settings and broader sensing modalities.

Abstract

This paper presents a task-oriented computational framework to enhance Visual-Inertial Navigation (VIN) in robots, addressing challenges such as limited time and energy resources. The framework strategically selects visual features using a Mean Squared Error (MSE)-based, non-submodular objective function and a simplified dynamic anticipation model. To address the NP-hardness of this problem, we introduce four polynomial-time approximation algorithms: a classic greedy method with constant-factor guarantees; a low-rank greedy variant that significantly reduces computational complexity; a randomized greedy sampler that balances efficiency and solution quality; and a linearization-based selector based on a first-order Taylor expansion for near-constant-time execution. We establish rigorous performance bounds by leveraging submodularity ratios, curvature, and element-wise curvature analyses. Extensive experiments on both standardized benchmarks and a custom control-aware platform validate our theoretical results, demonstrating that these methods achieve strong approximation guarantees while enabling real-time deployment.

Non-submodular Visual Attention for Robot Navigation

TL;DR

Abstract

Non-submodular Visual Attention for Robot Navigation

TL;DR

Abstract

Paper Structure

Table of Contents

Key Result

Figures (13)

Theorems & Definitions (20)