Table of Contents
Fetching ...

Radar-Guided Polynomial Fitting for Metric Depth Estimation

Patrick Rim, Hyoungseob Park, Vadim Ezhov, Jeffrey Moon, Alex Wong

TL;DR

POLAR tackles metric depth estimation by transforming scaleless MDE outputs through radar-guided polynomial fitting. By predicting polynomial coefficients from a fused radar-MDE representation, it enables depth-dependent, non-uniform corrections with inflection points, outperforming affine-correction approaches. A first-derivative monotonicity loss preserves local depth order while allowing cross-region refinements. The approach achieves state-of-the-art accuracy and real-time performance across multiple datasets, highlighting the practical viability of radar-guided scene-fitting for metric depth estimation.

Abstract

We propose POLAR, a novel radar-guided depth estimation method that introduces polynomial fitting to efficiently transform scaleless depth predictions from pretrained monocular depth estimation (MDE) models into metric depth maps. Unlike existing approaches that rely on complex architectures or expensive sensors, our method is grounded in a fundamental insight: although MDE models often infer reasonable local depth structure within each object or local region, they may misalign these regions relative to one another, making a linear scale and shift (affine) transformation insufficient given three or more of these regions. To address this limitation, we use polynomial coefficients predicted from cheap, ubiquitous radar data to adaptively adjust predictions non-uniformly across depth ranges. In this way, POLAR generalizes beyond affine transformations and is able to correct such misalignments by introducing inflection points. Importantly, our polynomial fitting framework preserves structural consistency through a novel training objective that enforces local monotonicity via first-derivative regularization. POLAR achieves state-of-the-art performance across three datasets, outperforming existing methods by an average of 24.9% in MAE and 33.2% in RMSE, while also achieving state-of-the-art efficiency in terms of latency and computational cost.

Radar-Guided Polynomial Fitting for Metric Depth Estimation

TL;DR

POLAR tackles metric depth estimation by transforming scaleless MDE outputs through radar-guided polynomial fitting. By predicting polynomial coefficients from a fused radar-MDE representation, it enables depth-dependent, non-uniform corrections with inflection points, outperforming affine-correction approaches. A first-derivative monotonicity loss preserves local depth order while allowing cross-region refinements. The approach achieves state-of-the-art accuracy and real-time performance across multiple datasets, highlighting the practical viability of radar-guided scene-fitting for metric depth estimation.

Abstract

We propose POLAR, a novel radar-guided depth estimation method that introduces polynomial fitting to efficiently transform scaleless depth predictions from pretrained monocular depth estimation (MDE) models into metric depth maps. Unlike existing approaches that rely on complex architectures or expensive sensors, our method is grounded in a fundamental insight: although MDE models often infer reasonable local depth structure within each object or local region, they may misalign these regions relative to one another, making a linear scale and shift (affine) transformation insufficient given three or more of these regions. To address this limitation, we use polynomial coefficients predicted from cheap, ubiquitous radar data to adaptively adjust predictions non-uniformly across depth ranges. In this way, POLAR generalizes beyond affine transformations and is able to correct such misalignments by introducing inflection points. Importantly, our polynomial fitting framework preserves structural consistency through a novel training objective that enforces local monotonicity via first-derivative regularization. POLAR achieves state-of-the-art performance across three datasets, outperforming existing methods by an average of 24.9% in MAE and 33.2% in RMSE, while also achieving state-of-the-art efficiency in terms of latency and computational cost.

Paper Structure

This paper contains 20 sections, 2 theorems, 11 equations, 7 figures, 14 tables.

Key Result

Proposition 1

There exist infinitely many sets of $k\geq3$ (MDE prediction$\hat{d}$, ground truth$d$) pairs such that no global scale $\alpha$ and shift $\beta$ satisfy $d = \alpha \hat{d} + \beta$ for all $k$ pairs simultaneously.

Figures (7)

  • Figure 1: If an MDE model predicts incorrect relative depths between three or more objects, an affine scale-and-shift (dashed) cannot resolve this misalignment. POLAR (solid) overcomes this limitation by learning an $N$'th-order polynomial fit with up to $N-2$ inflection points.
  • Figure 2: Method Overview. POLAR transforms scaleless MDE predictions into metric depth using polynomial fitting guided by radar features. Learnable prototypes extract patterns in the configurations of radar point clouds and are used to aggregate spatially-informed radar features. The geometry-aware MDE features are fused with the radar features via a learnable soft-correspondence module to yield a unified scene representation that is used to predict polynomial coefficients for fitting. This enables non-uniform corrections that improve accuracy beyond affine transformations.
  • Figure 3: Qualitative results on nuScenes. GET-UP and RadarCam-Depth (RC-D) fail to reconstruct entire regions, yielding objects with large depth errors. Raw MDE yields reasonable relative reconstructions but suffers from incorrect global scale and cross-object misalignments. POLAR leverages polynomial fitting to recover a global scale and correct these misalignments.
  • Figure 4: POLAR leverages spatial information from radar points to predict higher-degree polynomial transformations that can correct non-affine errors in MDE predictions.
  • Figure 5: To quantify absolute improvement in our qualitative results, we provide the colorbars that were used in all examples in Figs. \ref{['fig:supp_mat_qualitative1']} and \ref{['fig:lidar_vs_radar']}, as well as Figs. 3 and 4 in the main paper.
  • ...and 2 more figures

Theorems & Definitions (3)

  • Proposition 1
  • proof : Proof by Construction
  • Corollary 1