Table of Contents
Fetching ...

CostNav: A Navigation Benchmark for Cost-Aware Evaluation of Embodied Agents

Haebin Seong, Sungmin Kim, Minchan Kim, Yongjun Cho, Myunchul Joe, Suhwan Choi, Jaeyoon Jung, Jiyong Youn, Yoonshik Kim, Samwoo Seong, Yubeen Park, Youngjae Yu, Yunsung Lee

TL;DR

CostNav introduces a cost-aware navigation benchmark that translates navigation performance into economic outcomes by modeling the full lifecycle of autonomous delivery systems, including upfront hardware/training costs, per-run energy and maintenance, revenue with SLA adjustments, and break-even analysis. The framework is instantiated in a high-fidelity simulator (Isaac Lab) with the COCO delivery robot and evaluates a learning-based on-device baseline under Level 1–2 urban sidewalk scenarios, revealing a 43% SLA and a -$30.009 profit per run, with maintenance driving almost all costs. These results demonstrate a substantial gap between current navigation performance and commercial viability, highlighting collision avoidance and SLA improvements as primary levers for profitability. By providing cost-aware metrics and an economic evaluation workflow, CostNav enables apples-to-apples comparisons across navigation paradigms and supports data-driven deployment decisions for real-world embodied AI systems.

Abstract

Existing navigation benchmarks focus on task success metrics while overlooking economic viability -- critical for commercial deployment of autonomous delivery robots. We introduce \emph{CostNav}, a \textbf{Micro-Navigation Economic Testbed} that evaluates embodied agents through comprehensive cost-revenue analysis aligned with real-world business operations. CostNav models the complete economic lifecycle including hardware, training, energy, maintenance costs, and delivery revenue with service-level agreements, using industry-derived parameters. \textbf{To our knowledge, CostNav is the first work to quantitatively expose the gap between navigation research metrics and commercial viability}, revealing that optimizing for task success fundamentally differs from optimizing for economic deployment. Our cost model uses parameters derived from industry data sources (energy rates, delivery service pricing), and we project from a reduced-scale simulation to realistic deliveries. Under this projection, the baseline achieves 43.0\% SLA compliance but is \emph{not} commercially viable: yielding a loss of \$30.009 per run with no finite break-even point, because operating costs are dominated by collision-induced maintenance, which accounts for 99.7\% of per-run costs and highlights collision avoidance as a key optimization target. We demonstrate a learning-based on-device navigation baseline and establish a foundation for evaluating rule-based navigation, imitation learning, and cost-aware RL training. CostNav bridges the gap between navigation research and commercial deployment, enabling data-driven decisions about economic trade-offs across navigation paradigms.

CostNav: A Navigation Benchmark for Cost-Aware Evaluation of Embodied Agents

TL;DR

CostNav introduces a cost-aware navigation benchmark that translates navigation performance into economic outcomes by modeling the full lifecycle of autonomous delivery systems, including upfront hardware/training costs, per-run energy and maintenance, revenue with SLA adjustments, and break-even analysis. The framework is instantiated in a high-fidelity simulator (Isaac Lab) with the COCO delivery robot and evaluates a learning-based on-device baseline under Level 1–2 urban sidewalk scenarios, revealing a 43% SLA and a -$30.009 profit per run, with maintenance driving almost all costs. These results demonstrate a substantial gap between current navigation performance and commercial viability, highlighting collision avoidance and SLA improvements as primary levers for profitability. By providing cost-aware metrics and an economic evaluation workflow, CostNav enables apples-to-apples comparisons across navigation paradigms and supports data-driven deployment decisions for real-world embodied AI systems.

Abstract

Existing navigation benchmarks focus on task success metrics while overlooking economic viability -- critical for commercial deployment of autonomous delivery robots. We introduce \emph{CostNav}, a \textbf{Micro-Navigation Economic Testbed} that evaluates embodied agents through comprehensive cost-revenue analysis aligned with real-world business operations. CostNav models the complete economic lifecycle including hardware, training, energy, maintenance costs, and delivery revenue with service-level agreements, using industry-derived parameters. \textbf{To our knowledge, CostNav is the first work to quantitatively expose the gap between navigation research metrics and commercial viability}, revealing that optimizing for task success fundamentally differs from optimizing for economic deployment. Our cost model uses parameters derived from industry data sources (energy rates, delivery service pricing), and we project from a reduced-scale simulation to realistic deliveries. Under this projection, the baseline achieves 43.0\% SLA compliance but is \emph{not} commercially viable: yielding a loss of \$30.009 per run with no finite break-even point, because operating costs are dominated by collision-induced maintenance, which accounts for 99.7\% of per-run costs and highlights collision avoidance as a key optimization target. We demonstrate a learning-based on-device navigation baseline and establish a foundation for evaluating rule-based navigation, imitation learning, and cost-aware RL training. CostNav bridges the gap between navigation research and commercial deployment, enabling data-driven decisions about economic trade-offs across navigation paradigms.

Paper Structure

This paper contains 25 sections, 13 equations, 3 figures, 4 tables.

Figures (3)

  • Figure 1: A motivational example highlighting the core idea behind the CostNav benchmark. Traditional metrics like success rate or collision rate overlook navigation behaviors that can lead to costly outcomes. For instance, overly sharp turning can spill beverages and cause unnecessary expenses. This gap motivates CostNav, which evaluates navigation through an economic lens.
  • Figure 2: End-to-end process of the CostNav benchmark, from simulation environments to break-even point analysis. Simulation logs capture key operational signals—such as collision dynamics, energy usage, delivery time, and food intactness—that reflect how a robot behaves in realistic delivery scenarios. These signals are then combined with real-world cost and revenue models to compute profit curves and determine each method’s break-even point. By translating navigation behaviors into economic outcomes, CostNav enables a leaderboard that ranks embodied agents based on financial performance rather than traditional task-centric metrics.
  • Figure 3: Economic Difficulty Levels for Navigation Evaluation. We propose a taxonomy of three difficulty levels to systematically evaluate economic viability. Level 1 (Ideal) establishes baseline performance under sparse conditions. Level 2 (Dynamic) introduces pedestrian traffic to test navigation in crowded environments. Level 3 (Real-World) incorporates adverse conditions (weather, lighting, long-tail failures) where human intervention costs ($C_{\text{rescue}}$) become significant. This work focuses on Level 1 & 2.