Table of Contents
Fetching ...

Deep Single Image Camera Calibration by Heatmap Regression to Recover Fisheye Images Under Manhattan World Assumption

Nobuhiko Wakai, Satoshi Sato, Yasunori Ishii, Takayoshi Yamashita

TL;DR

This work tackles single-image camera calibration for fisheye images in a Manhattan world. It introduces a heatmap-based vanishing-point estimator (VP/ADP) coupled with a distortion estimator to recover rotation and remove distortion from a single distorted image. The ADPs, arranged with octahedral symmetry to achieve 3D spatial uniformity, compensate for scarce vanishing points, enabling robust extrinsic/intrinsic estimation. Across large outdoor datasets and off-the-shelf cameras, the approach outperforms geometry-based and previous learning-based methods in rotation accuracy and reprojection error, while delivering consistent image recovery. The method promises robust, camera-parameter-free calibration for perception tasks in urban environments, with potential extensions to indoor scenes and multi-view inputs.

Abstract

A Manhattan world lying along cuboid buildings is useful for camera angle estimation. However, accurate and robust angle estimation from fisheye images in the Manhattan world has remained an open challenge because general scene images tend to lack constraints such as lines, arcs, and vanishing points. To achieve higher accuracy and robustness, we propose a learning-based calibration method that uses heatmap regression, which is similar to pose estimation using keypoints, to detect the directions of labeled image coordinates. Simultaneously, our two estimators recover the rotation and remove fisheye distortion by remapping from a general scene image. Without considering vanishing-point constraints, we find that additional points for learning-based methods can be defined. To compensate for the lack of vanishing points in images, we introduce auxiliary diagonal points that have the optimal 3D arrangement of spatial uniformity. Extensive experiments demonstrated that our method outperforms conventional methods on large-scale datasets and with off-the-shelf cameras.

Deep Single Image Camera Calibration by Heatmap Regression to Recover Fisheye Images Under Manhattan World Assumption

TL;DR

This work tackles single-image camera calibration for fisheye images in a Manhattan world. It introduces a heatmap-based vanishing-point estimator (VP/ADP) coupled with a distortion estimator to recover rotation and remove distortion from a single distorted image. The ADPs, arranged with octahedral symmetry to achieve 3D spatial uniformity, compensate for scarce vanishing points, enabling robust extrinsic/intrinsic estimation. Across large outdoor datasets and off-the-shelf cameras, the approach outperforms geometry-based and previous learning-based methods in rotation accuracy and reprojection error, while delivering consistent image recovery. The method promises robust, camera-parameter-free calibration for perception tasks in urban environments, with potential extensions to indoor scenes and multi-view inputs.

Abstract

A Manhattan world lying along cuboid buildings is useful for camera angle estimation. However, accurate and robust angle estimation from fisheye images in the Manhattan world has remained an open challenge because general scene images tend to lack constraints such as lines, arcs, and vanishing points. To achieve higher accuracy and robustness, we propose a learning-based calibration method that uses heatmap regression, which is similar to pose estimation using keypoints, to detect the directions of labeled image coordinates. Simultaneously, our two estimators recover the rotation and remove fisheye distortion by remapping from a general scene image. Without considering vanishing-point constraints, we find that additional points for learning-based methods can be defined. To compensate for the lack of vanishing points in images, we introduce auxiliary diagonal points that have the optimal 3D arrangement of spatial uniformity. Extensive experiments demonstrated that our method outperforms conventional methods on large-scale datasets and with off-the-shelf cameras.
Paper Structure (32 sections, 6 equations, 18 figures, 20 tables)

This paper contains 32 sections, 6 equations, 18 figures, 20 tables.

Figures (18)

  • Figure 1: Our network estimates extrinsics and intrinsics in a Manhattan world from a single image. Our estimated camera parameters are used to fully recover images by remapping them while distinguishing the front and side directions on the basis of the Manhattan world. Cyan, magenta, and yellow lines indicate the three orthogonal planes of the Manhattan frame in each of the images. The input image is generated from Mirowski2019.
  • Figure 2: Definition of world coordinates in a Manhattan world. The origins of the world coordinates of the Manhattan world and a camera are $O_M$-$X_MY_MZ_M$ and $O_C$-$X_CY_CZ_C$, respectively. All walls of the cuboid buildings are parallel to the corresponding planes in $O_M$-$X_MY_MZ_M$.
  • Figure 3: Coordinates of VPs and ADPs in a Manhattan world. The labels of the VPs and ADPs correspond to the labels described in Table \ref{['table-vp-coordinates']}.
  • Figure 4: Calibration pipeline for inference. The intrinsics are estimated by the distortion estimator. Camera models project VP/ADPs onto the unit sphere using backprojection. The extrinsics are calculated from the fitting. The input fisheye image is generated from Mirowski2019.
  • Figure 5: Qualitative results of VP/ADP detection using the proposed VP estimator on the SL-MH test set. The VP estimator estimated five VP and eight ADP heatmaps for each VP/ADP.
  • ...and 13 more figures