Table of Contents
Fetching ...

GravCal: Single-Image Calibration of IMU Gravity Priors with Per-Sample Confidence

Haichao Zhu, Qian Zhang

Abstract

Gravity estimation is fundamental to visual-inertial perception, augmented reality, and robotics, yet gravity priors from IMUs are often unreliable under linear acceleration, vibration, and transient motion. Existing methods often estimate gravity directly from images or assume reasonably accurate inertial input, leaving the practical problem of correcting a noisy gravity prior from a single image largely unaddressed. We present GravCal, a feedforward model for single-image gravity prior calibration. Given one RGB image and a noisy gravity prior, GravCal predicts a corrected gravity direction and a per-sample confidence score. The model combines two complementary predictions, including a residual correction of the input prior and a prior-independent image estimate, and uses a learned gate to fuse them adaptively. Extensive experiments show strong gains over raw inertial priors: GravCal reduces mean angular error from 22.02° (IMU prior) to 14.24°, with larger improvements when the prior is severely corrupted. We also introduce a novel dataset of over 148K frames with paired VIO-derived ground-truth gravity and Mahony-filter IMU priors across diverse scenes and arbitrary camera orientations. The learned gate also correlates with prior quality, making it a useful confidence signal for downstream systems.

GravCal: Single-Image Calibration of IMU Gravity Priors with Per-Sample Confidence

Abstract

Gravity estimation is fundamental to visual-inertial perception, augmented reality, and robotics, yet gravity priors from IMUs are often unreliable under linear acceleration, vibration, and transient motion. Existing methods often estimate gravity directly from images or assume reasonably accurate inertial input, leaving the practical problem of correcting a noisy gravity prior from a single image largely unaddressed. We present GravCal, a feedforward model for single-image gravity prior calibration. Given one RGB image and a noisy gravity prior, GravCal predicts a corrected gravity direction and a per-sample confidence score. The model combines two complementary predictions, including a residual correction of the input prior and a prior-independent image estimate, and uses a learned gate to fuse them adaptively. Extensive experiments show strong gains over raw inertial priors: GravCal reduces mean angular error from 22.02° (IMU prior) to 14.24°, with larger improvements when the prior is severely corrupted. We also introduce a novel dataset of over 148K frames with paired VIO-derived ground-truth gravity and Mahony-filter IMU priors across diverse scenes and arbitrary camera orientations. The learned gate also correlates with prior quality, making it a useful confidence signal for downstream systems.
Paper Structure (44 sections, 16 equations, 4 figures, 5 tables)

This paper contains 44 sections, 16 equations, 4 figures, 5 tables.

Figures (4)

  • Figure 1: GravCal calibrates an IMU-derived gravity prior using a single image, at arbitrary camera orientations. Examples span sideways ($90^\circ$) to near-inverted ($180^\circ$) poses. GravCal reduces the prior error in a single forward pass, without any orientation-specific tuning.
  • Figure 2: GravCal Overview. The input image is encoded by an EfficientNet-B0 backbone. In the prior-guided branch, the IMU prior $\hat{\mathbf{g}}$ is mapped by a PriorMLP and used to FiLM-condition the visual feature, which is then refined by a prior-correction module to produce the corrected estimate $\mathbf{g}_{\mathrm{corr}}$. In parallel, the backbone feature is also used to regress an image-only estimate $\mathbf{g}_{\mathrm{img}}$. The final gravity $\mathbf{g}_{\mathrm{pred}}$ is obtained by fusing the corrected prior and the image-only estimate via adaptive gating $\tau$.
  • Figure 3: Distribution of gravity directions. The sphere heatmap shows directional density on $S^2$, and the marginal plots summarize angular coverage. The distribution spans a broad range of orientations rather than concentrating around upright poses.
  • Figure 4: Mean angular error (degrees $\downarrow$) on our test set, broken down by camera tilt angle in $30^\circ$ increments. GeoCalib excels near upright but collapses beyond $30^\circ$; the fused model benefits from the inertial prior and leads in four of six bins. Full numerical values are provided in the supplementary material.