Table of Contents
Fetching ...

Seeing through Uncertainty: Robust Task-Oriented Optimization in Visual Navigation

Yiyuan Pan, Yunzhe Xu, Zhe Liu, Hesheng Wang

TL;DR

NeuRO tackles long-horizon visual navigation under data scarcity and partial observability by fusing a Neural Perception Module with a Robust Optimization Planner. It leverages Partially Input Convex Neural Networks (PICNNs) for conformal calibration to produce convex uncertainty sets, and casts POMDP-like planning as robust optimization, with gradient flow enabled through KKT-based implicit differentiation and a Goal Vector Method to balance task and environment rewards. The solution feedback loop refines actions and rewards from the optimization output, allowing end-to-end training and transferability to different task formulations (U-MON and S-MON). Empirical results on unordered and sequential MultiON benchmarks show NeuRO achieving SoTA performance and improved generalization in unseen environments, with ablations and scalability analyses highlighting practical tradeoffs and robustness benefits.

Abstract

Visual navigation is a fundamental problem in embodied AI, yet practical deployments demand long-horizon planning capabilities to address multi-objective tasks. A major bottleneck is data scarcity: policies learned from limited data often overfit and fail to generalize OOD. Existing neural network-based agents typically increase architectural complexity that paradoxically become counterproductive in the small-sample regime. This paper introduce NeuRO, a integrated learning-to-optimize framework that tightly couples perception networks with downstream task-level robust optimization. Specifically, NeuRO addresses core difficulties in this integration: (i) it transforms noisy visual predictions under data scarcity into convex uncertainty sets using Partially Input Convex Neural Networks (PICNNs) with conformal calibration, which directly parameterize the optimization constraints; and (ii) it reformulates planning under partial observability as a robust optimization problem, enabling uncertainty-aware policies that transfer across environments. Extensive experiments on both unordered and sequential multi-object navigation tasks demonstrate that NeuRO establishes SoTA performance, particularly in generalization to unseen environments. Our work thus presents a significant advancement for developing robust, generalizable autonomous agents.

Seeing through Uncertainty: Robust Task-Oriented Optimization in Visual Navigation

TL;DR

NeuRO tackles long-horizon visual navigation under data scarcity and partial observability by fusing a Neural Perception Module with a Robust Optimization Planner. It leverages Partially Input Convex Neural Networks (PICNNs) for conformal calibration to produce convex uncertainty sets, and casts POMDP-like planning as robust optimization, with gradient flow enabled through KKT-based implicit differentiation and a Goal Vector Method to balance task and environment rewards. The solution feedback loop refines actions and rewards from the optimization output, allowing end-to-end training and transferability to different task formulations (U-MON and S-MON). Empirical results on unordered and sequential MultiON benchmarks show NeuRO achieving SoTA performance and improved generalization in unseen environments, with ablations and scalability analyses highlighting practical tradeoffs and robustness benefits.

Abstract

Visual navigation is a fundamental problem in embodied AI, yet practical deployments demand long-horizon planning capabilities to address multi-objective tasks. A major bottleneck is data scarcity: policies learned from limited data often overfit and fail to generalize OOD. Existing neural network-based agents typically increase architectural complexity that paradoxically become counterproductive in the small-sample regime. This paper introduce NeuRO, a integrated learning-to-optimize framework that tightly couples perception networks with downstream task-level robust optimization. Specifically, NeuRO addresses core difficulties in this integration: (i) it transforms noisy visual predictions under data scarcity into convex uncertainty sets using Partially Input Convex Neural Networks (PICNNs) with conformal calibration, which directly parameterize the optimization constraints; and (ii) it reformulates planning under partial observability as a robust optimization problem, enabling uncertainty-aware policies that transfer across environments. Extensive experiments on both unordered and sequential multi-object navigation tasks demonstrate that NeuRO establishes SoTA performance, particularly in generalization to unseen environments. Our work thus presents a significant advancement for developing robust, generalizable autonomous agents.

Paper Structure

This paper contains 22 sections, 2 theorems, 28 equations, 7 figures, 9 tables, 1 algorithm.

Key Result

Proposition 1

Let the dataset $\mathcal{D}=\{ (x_n, M_n)\}^N_{n=1}$ to be sampled i.i.d from the implicit distribution $\mathcal{P}$ gained during training phase. And $q$ is set to be the $(1-\alpha)$-quantile for the set $\{g(x_n, M_n)\}_{n=1}^N$, then $\Omega(x)$ gains the following guarantee.

Figures (7)

  • Figure 1: Overview of the work. (Left) The foundational concept of a task-based optimization framework and its inherent challenges when directly applied to visual navigation. (Right) NeuRO framework addresses this mismatch through a conformal visual-processing method and a robust optimization formulation for decision-making.
  • Figure 2: Architecture of NeuRO. (Left) The decision-making process of a purely network-based agent at each navigation step. (Right) Our introduced optimization module, which redirects the agent's training back toward the task itself.
  • Figure 3: Learning curves for NeuRO and baseline during training for different tasks.
  • Figure 4: Visualization of the learned object transition matrix$\mathbf{M_i^t}$. (Left) Mapping objects to optimization grid $H$ and their belief representation in matrix $M_i^t$; darkened columns in $M$ denote high-confidence presence in corresponding cells. (Right) navigation scenarios, with the red arrows point to the cell $v$ where the agent predicts the target object is most likely located.
  • Figure 5: The workflow of solving capacity expansion problem using the NeuRO framework.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Definition 1: Marginal Coverage
  • Proposition 1: Coverage with Quantile
  • Proposition 1