Table of Contents
Fetching ...

GeoCalib: Learning Single-image Calibration with Geometric Optimization

Alexander Veicht, Paul-Edouard Sarlin, Philipp Lindenberger, Marc Pollefeys

TL;DR

GeoCalib is a deep neural network that leverages universal rules of 3D geometry through an optimization process and is trained end-to-end to estimate camera parameters and learns to find useful visual cues from the data.

Abstract

From a single image, visual cues can help deduce intrinsic and extrinsic camera parameters like the focal length and the gravity direction. This single-image calibration can benefit various downstream applications like image editing and 3D mapping. Current approaches to this problem are based on either classical geometry with lines and vanishing points or on deep neural networks trained end-to-end. The learned approaches are more robust but struggle to generalize to new environments and are less accurate than their classical counterparts. We hypothesize that they lack the constraints that 3D geometry provides. In this work, we introduce GeoCalib, a deep neural network that leverages universal rules of 3D geometry through an optimization process. GeoCalib is trained end-to-end to estimate camera parameters and learns to find useful visual cues from the data. Experiments on various benchmarks show that GeoCalib is more robust and more accurate than existing classical and learned approaches. Its internal optimization estimates uncertainties, which help flag failure cases and benefit downstream applications like visual localization. The code and trained models are publicly available at https://github.com/cvg/GeoCalib.

GeoCalib: Learning Single-image Calibration with Geometric Optimization

TL;DR

GeoCalib is a deep neural network that leverages universal rules of 3D geometry through an optimization process and is trained end-to-end to estimate camera parameters and learns to find useful visual cues from the data.

Abstract

From a single image, visual cues can help deduce intrinsic and extrinsic camera parameters like the focal length and the gravity direction. This single-image calibration can benefit various downstream applications like image editing and 3D mapping. Current approaches to this problem are based on either classical geometry with lines and vanishing points or on deep neural networks trained end-to-end. The learned approaches are more robust but struggle to generalize to new environments and are less accurate than their classical counterparts. We hypothesize that they lack the constraints that 3D geometry provides. In this work, we introduce GeoCalib, a deep neural network that leverages universal rules of 3D geometry through an optimization process. GeoCalib is trained end-to-end to estimate camera parameters and learns to find useful visual cues from the data. Experiments on various benchmarks show that GeoCalib is more robust and more accurate than existing classical and learned approaches. Its internal optimization estimates uncertainties, which help flag failure cases and benefit downstream applications like visual localization. The code and trained models are publicly available at https://github.com/cvg/GeoCalib.
Paper Structure (66 sections, 13 equations, 15 figures, 6 tables)

This paper contains 66 sections, 13 equations, 15 figures, 6 tables.

Figures (15)

  • Figure 1: Learning vs. geometry? To estimate the camera calibration from a single image, classical approaches struggle with environments devoid of lines while deep networks are so far not as accurate. GeoCalib combines the best of both: its learns to steer an optimization using diverse geometric and semantic cues learned end-to-end.
  • Figure 2: Architecture of GeoCalib. A DNN predicts a Perspectivel Field with confidences, to which camera parameters are fitted with a Levenberg-Marquardt optimization. GeoCalib is trained end-to-end by supervising the optimized parameters. Priors over some of them or a different distortion model can be easily included without retraining.
  • Figure 3: Good features to calibrate. We show the confidences learned by GeoCalib for both components of the Perspective Field. The up-vector is most confident near vertical lines or upright objects like trees. The latitude is most confident near the horizon.
  • Figure 4: Ranking images by uncertainty. We report the gravity error / uncertainty for 8 outdoor (top) and indoor (bottom) images from left-to-right, sorted by uncertainty. The estimated uncertainty correlates well with the ground truth error.
  • Figure 5: Qualitative results. We show five examples of GeoCalib's prediction on Stanford2D3D armeni2017joint, TartanAir tartanair2020iros, MegaDepth li2018megadepth and LaMAR sarlin2022lamar (x2). a-b) depict the generated up-vector and latitude field from ground-truth and estimated camera parameters. c-d) depict the latitude and up-vector fields observed by the DNN, with opacity from the learned confidences. Here, green denotes accurate and red inaccurate predicted fields w.r.t. ground-truth. GeoCalib learns to predict accurate fields and to discard observations in regions that are less informative, e.g. the floor.
  • ...and 10 more figures