Table of Contents
Fetching ...

MetricNet: Recovering Metric Scale in Generative Navigation Policies

Abhijeet Nayak, Débora Oliveira Makowski, Samiran Gode, Cordelia Schmid, Wolfram Burgard

TL;DR

This work proposes MetricNet, an effective add-on for generative navigation that predicts the metric distance between waypoints, grounding policy outputs in metric coordinates, and proposes MetricNav, which integrates MetricNet into a navigation policy to guide the robot away from obstacles while still moving towards the goal.

Abstract

Generative navigation policies have made rapid progress in improving end-to-end learned navigation. Despite their promising results, this paradigm has two structural problems. First, the sampled trajectories exist in an abstract, unscaled space without metric grounding. Second, the control strategy discards the full path, instead moving directly towards a single waypoint. This leads to short-sighted and unsafe actions, moving the robot towards obstacles that a complete and correctly scaled path would circumvent. To address these issues, we propose MetricNet, an effective add-on for generative navigation that predicts the metric distance between waypoints, grounding policy outputs in metric coordinates. We evaluate our method in simulation with a new benchmarking framework and show that executing MetricNet-scaled waypoints significantly improves both navigation and exploration performance. Beyond simulation, we further validate our approach in real-world experiments. Finally, we propose MetricNav, which integrates MetricNet into a navigation policy to guide the robot away from obstacles while still moving towards the goal.

MetricNet: Recovering Metric Scale in Generative Navigation Policies

TL;DR

This work proposes MetricNet, an effective add-on for generative navigation that predicts the metric distance between waypoints, grounding policy outputs in metric coordinates, and proposes MetricNav, which integrates MetricNet into a navigation policy to guide the robot away from obstacles while still moving towards the goal.

Abstract

Generative navigation policies have made rapid progress in improving end-to-end learned navigation. Despite their promising results, this paradigm has two structural problems. First, the sampled trajectories exist in an abstract, unscaled space without metric grounding. Second, the control strategy discards the full path, instead moving directly towards a single waypoint. This leads to short-sighted and unsafe actions, moving the robot towards obstacles that a complete and correctly scaled path would circumvent. To address these issues, we propose MetricNet, an effective add-on for generative navigation that predicts the metric distance between waypoints, grounding policy outputs in metric coordinates. We evaluate our method in simulation with a new benchmarking framework and show that executing MetricNet-scaled waypoints significantly improves both navigation and exploration performance. Beyond simulation, we further validate our approach in real-world experiments. Finally, we propose MetricNav, which integrates MetricNet into a navigation policy to guide the robot away from obstacles while still moving towards the goal.

Paper Structure

This paper contains 28 sections, 14 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: [id=d]Two examples of MetricNet [id=d]tTransforming waypoints [id=d](red) sampled from NOMAD sridhar2024nomad[id=d](red) into a collision-free plan (green) around an obstacle to reach the goal. First, our novel SCALENET module grounds the waypoints in the [id=d]3Dreal world by predicting a metric scale. Second, ours leverages these scaled points for goal following and collision avoidance around objects. The trajectories are plotted on a Truncated Signed Distance Function (TSDF).
  • Figure 2: In (a), we illustrate that when using a velocity controller, selecting only the goal waypoint can still lead to collisions despite the predicted trajectory in [id=d]metricreal-world scale being free of obstacles. Note this waypoint only points towards the direction. In (b), we show that depending on the scale factor $\phi$, the resulting path may collide with walls if the scale is too small or too large for the environment. This highlights the need not only for the correct [id=d]real-world scale but also for a controller that executes the whole trajectory in [id=d]metricreal-world space.
  • Figure 3: SCALENET architecture. Our network estimates a factor that converts the unscaled output of generative visual-goal navigation policies into [id=d]metricreal-world scale. SCALENET first tokenizes patches of the current observation using an image encoder. In parallel, the observation is processed with the pre-trained encoder from the Depth-Anything-V2 yang2025depth to produce depth patch tokens. A transformer then processes the combined token sequence, prepending a CLS token to summarize an aggregated representation. Finally, the output CLS token is passed through an MLP to predict the estimated scale. This scale is multiplied by the original unscaled waypoints (blue), generating grounded trajectories in [id=d]metricreal-world coordinates (red).
  • Figure 4: ours architecture uses a navigation policy to sample $\mathcal{K}$ trajectories (blue). Next, spherical k-means clustering is used to estimate the most common goal trajectory (red). This trajectory is then used to derive a cost composed of a goal following and a collision avoidance term using an estimated TSDF from monocular depth. This cost is then fed into the diffusion policy to guide the sampled trajectory into the obstacle-free space, while moving towards the goal. The white arrows show the goal guiding gradients whereas the brown arrows show the collision avoiding gradients.
  • Figure 5: Box plot for navigation and exploration in simulation across velocity and position control using a constant scale and prediction by SCALENET. Using SCALENET outperforms using the constant scale proposed by previous work across all base navigation policies. Moreover, using metric waypoints predicted by SCALENET with position controller improves the result compared to the velocity control. Note that in all navigation experiments, at least one seed fails and one reaches the goal across all baseline policies. The key difference lies in the median of the distribution and in how many average seeds succeed.
  • ...and 2 more figures