Table of Contents
Fetching ...

U-ARE-ME: Uncertainty-Aware Rotation Estimation in Manhattan Environments

Aalok Patwardhan, Callum Rhodes, Gwangbin Bae, Andrew J. Davison

TL;DR

U-ARE-ME tackles monocular camera rotation estimation without depth or intrinsics by exploiting Manhattan World priors and per-pixel surface normals with learned uncertainty. It combines a single-frame uncertainty-weighted rotation estimation on $SO(3)$ with a robust multi-frame sliding-window factor-graph that enforces temporal consistency and accounts for varying information content across frames. The method demonstrates competitive accuracy to RGB‑D approaches and superior robustness to dropped frames and non-Manhattan regions, validated on ICL-NUIM, TUM RGB‑D, and ScanNet, and extends to applications such as up-vector estimation, horizon detection in non-inertial frames, and ground segmentation. Its ability to produce per-frame uncertainty further stabilizes long sequences and enables reliable real-time deployment on RGB-only data.

Abstract

Camera rotation estimation from a single image is a challenging task, often requiring depth data and/or camera intrinsics, which are generally not available for in-the-wild videos. Although external sensors such as inertial measurement units (IMUs) can help, they often suffer from drift and are not applicable in non-inertial reference frames. We present U-ARE-ME, an algorithm that estimates camera rotation along with uncertainty from uncalibrated RGB images. Using a Manhattan World assumption, our method leverages the per-pixel geometric priors encoded in single-image surface normal predictions and performs optimisation over the SO(3) manifold. Given a sequence of images, we can use the per-frame rotation estimates and their uncertainty to perform multi-frame optimisation, achieving robustness and temporal consistency. Our experiments demonstrate that U-ARE-ME performs comparably to RGB-D methods and is more robust than sparse feature-based SLAM methods. We encourage the reader to view the accompanying video at https://callum-rhodes.github.io/U-ARE-ME for a visual overview of our method.

U-ARE-ME: Uncertainty-Aware Rotation Estimation in Manhattan Environments

TL;DR

U-ARE-ME tackles monocular camera rotation estimation without depth or intrinsics by exploiting Manhattan World priors and per-pixel surface normals with learned uncertainty. It combines a single-frame uncertainty-weighted rotation estimation on with a robust multi-frame sliding-window factor-graph that enforces temporal consistency and accounts for varying information content across frames. The method demonstrates competitive accuracy to RGB‑D approaches and superior robustness to dropped frames and non-Manhattan regions, validated on ICL-NUIM, TUM RGB‑D, and ScanNet, and extends to applications such as up-vector estimation, horizon detection in non-inertial frames, and ground segmentation. Its ability to produce per-frame uncertainty further stabilizes long sequences and enables reliable real-time deployment on RGB-only data.

Abstract

Camera rotation estimation from a single image is a challenging task, often requiring depth data and/or camera intrinsics, which are generally not available for in-the-wild videos. Although external sensors such as inertial measurement units (IMUs) can help, they often suffer from drift and are not applicable in non-inertial reference frames. We present U-ARE-ME, an algorithm that estimates camera rotation along with uncertainty from uncalibrated RGB images. Using a Manhattan World assumption, our method leverages the per-pixel geometric priors encoded in single-image surface normal predictions and performs optimisation over the SO(3) manifold. Given a sequence of images, we can use the per-frame rotation estimates and their uncertainty to perform multi-frame optimisation, achieving robustness and temporal consistency. Our experiments demonstrate that U-ARE-ME performs comparably to RGB-D methods and is more robust than sparse feature-based SLAM methods. We encourage the reader to view the accompanying video at https://callum-rhodes.github.io/U-ARE-ME for a visual overview of our method.
Paper Structure (30 sections, 8 equations, 6 figures, 3 tables)

This paper contains 30 sections, 8 equations, 6 figures, 3 tables.

Figures (6)

  • Figure 1: U--ARE--ME provides globally consistent rotation estimates in Manhattan environments across sequences of RGB images -- without camera intrinsics. For each frame, we estimate the rotation from the predicted surface normals along with pixel-wise uncertainty and enforce temporal consistency via a factor graph.
  • Figure 2: IMU sensors are prone to drift especially in non-inertial frames of reference (e.g. inside a moving vehicle). The horizon line in each image represents the line perpendicular to the up-vector inferred from our proposed method (middle) and an IMU sensor (right).
  • Figure 3: (left) This figure visualises the cost function defined by a single normal vector. The cost is minimised when the (rotated) principal axes are parallel or vertical. (middle) Here we compare the shape of different probability distributions defined over a unit sphere. (right) This figure visualises the cost function defined by three mutually orthogonal Manhattan axes.
  • Figure 4: The multi-frame optimisation process. Single-frame rotation and covariance estimates are used to initialise a sliding window factor graph in order to provide temporal consistency between frames and reject outlier measurements. Robust factors are shown along orange edges on measurements. The latest frame is then used to initialise the rotation estimate for the next frame.
  • Figure 5: (left) U--ARE--ME and ORB-SLAM accuracy comparison on 100 sequences from the ScanNet dataset. Solid line shows cumulative number of runs below a certain accuracy threshold for U--ARE--ME and dashed lines are for ORB-SLAM. Percentage pass rate is shown, whereby at least X% of frames per sequence must contain a valid rotation estimate (this includes any initialisation and loss of tracking). (right) Ablation study experiments. The blue line shows the results of the full pipeline. 'single' means that multi-frame optimisation is disabled and 'no $\kappa$' means that the uncertainty weighting in the cost function is removed.
  • ...and 1 more figures