Table of Contents
Fetching ...

fCOP: Focal Length Estimation from Category-level Object Priors

Xinyue Zhang, Jiaqi Yang, Xiangting Meng, Abdelrahman Mohamed, Laurent Kneip

TL;DR

The paper tackles monocular focal length estimation from a single image without strong scene geometry priors by leveraging category-level object priors and monocular depth predictions. It introduces fCOP, a minimal closed-form solver that recovers focal length from triplets of correspondences while decoupling focal length from object scale and pose, and it embeds robust estimation via Interval Stabbing with frame-wise consistency enforcement. Comprehensive experiments on synthetic data and real datasets (REAL275 and MultiFocals) show state-of-the-art focal-length accuracy and strong generalization to out-of-domain data, outperforming existing monocular intrinsic estimation methods. The estimated focal length is demonstrated to improve downstream category-level object pose estimation using RGB-D cues, highlighting practical impact for 3D understanding with uncalibrated monocular input.

Abstract

In the realm of computer vision, the perception and reconstruction of the 3D world through vision signals heavily rely on camera intrinsic parameters, which have long been a subject of intense research within the community. In practical applications, without a strong scene geometry prior like the Manhattan World assumption or special artificial calibration patterns, monocular focal length estimation becomes a challenging task. In this paper, we propose a method for monocular focal length estimation using category-level object priors. Based on two well-studied existing tasks: monocular depth estimation and category-level object canonical representation learning, our focal solver takes depth priors and object shape priors from images containing objects and estimates the focal length from triplets of correspondences in closed form. Our experiments on simulated and real world data demonstrate that the proposed method outperforms the current state-of-the-art, offering a promising solution to the long-standing monocular focal length estimation problem.

fCOP: Focal Length Estimation from Category-level Object Priors

TL;DR

The paper tackles monocular focal length estimation from a single image without strong scene geometry priors by leveraging category-level object priors and monocular depth predictions. It introduces fCOP, a minimal closed-form solver that recovers focal length from triplets of correspondences while decoupling focal length from object scale and pose, and it embeds robust estimation via Interval Stabbing with frame-wise consistency enforcement. Comprehensive experiments on synthetic data and real datasets (REAL275 and MultiFocals) show state-of-the-art focal-length accuracy and strong generalization to out-of-domain data, outperforming existing monocular intrinsic estimation methods. The estimated focal length is demonstrated to improve downstream category-level object pose estimation using RGB-D cues, highlighting practical impact for 3D understanding with uncalibrated monocular input.

Abstract

In the realm of computer vision, the perception and reconstruction of the 3D world through vision signals heavily rely on camera intrinsic parameters, which have long been a subject of intense research within the community. In practical applications, without a strong scene geometry prior like the Manhattan World assumption or special artificial calibration patterns, monocular focal length estimation becomes a challenging task. In this paper, we propose a method for monocular focal length estimation using category-level object priors. Based on two well-studied existing tasks: monocular depth estimation and category-level object canonical representation learning, our focal solver takes depth priors and object shape priors from images containing objects and estimates the focal length from triplets of correspondences in closed form. Our experiments on simulated and real world data demonstrate that the proposed method outperforms the current state-of-the-art, offering a promising solution to the long-standing monocular focal length estimation problem.
Paper Structure (13 sections, 12 equations, 7 figures, 3 tables, 1 algorithm)

This paper contains 13 sections, 12 equations, 7 figures, 3 tables, 1 algorithm.

Figures (7)

  • Figure 1: Overview of the pipeline for focal length estimation using category-level object priors. (a) The input to our approach is an RGB image containing objects of known categories. (b) Utilizing state-of-the-art monocular depth and normalized object coordinates (NOCs) predictors, we obtain the depth $d_i$ and the 3D canonical point $\mathbf{p}_i$ for each observable 2D image point $\mathbf{x}_i$ on the objects. (c) The correspondences $(\mathbf{x}_i, d_i, \mathbf{p}_i)$ are constrained by the unknown intrinsic parameters and the object's pose. (d) The proposed focal length minimal solver $f$COP. For any two points on the object, the distance in the camera frame and in 3D space is determined solely by the focal length and object scale. By using a triplet of correspondences, the focal length can be estimated.
  • Figure 2: Scene-wise focal estimation illustration. When there are multiple objects scattering in one image frame, since the depth and NOCs could contaminated with noise, focal length estimated for each object would centered on a perturbed focal and resulting in different focal lengths from different objects. Using Interval Stabbing over focal lengths estimated from all the objects would provide a consistent and accurate estimation.
  • Figure 3: Same image that captured one object using different focal lengths results in different shapes of back-projected objects in 3D space. It can be observed that the object got flattened in the $x,y$-axis by using a larger focal length. Consequently, more outliers would be counted as inliers in the right one.
  • Figure 4: Simulation results. The focal length solver holds great stability on noise-free data and the corresponding poses estimated by Umeyama are also stable. The focal length errors perform as the function of depth and NOCs noise.
  • Figure 5: Example images from real datasets. The first row is images from REAL275 wang2019normalized, there are large planes in the scene all using the same focal length $590px$. The second row is images from the proposed dataset -- MultiFocals. As its name suggested, the dataset is collected with $31$ different focal lengths ranging from $533$px to $1390$px in daily scenes.
  • ...and 2 more figures

Theorems & Definitions (2)

  • Remark 1
  • Remark 2