Deep Single Image Camera Calibration by Heatmap Regression to Recover Fisheye Images Under Manhattan World Assumption
Nobuhiko Wakai, Satoshi Sato, Yasunori Ishii, Takayoshi Yamashita
TL;DR
This work tackles single-image camera calibration for fisheye images in a Manhattan world. It introduces a heatmap-based vanishing-point estimator (VP/ADP) coupled with a distortion estimator to recover rotation and remove distortion from a single distorted image. The ADPs, arranged with octahedral symmetry to achieve 3D spatial uniformity, compensate for scarce vanishing points, enabling robust extrinsic/intrinsic estimation. Across large outdoor datasets and off-the-shelf cameras, the approach outperforms geometry-based and previous learning-based methods in rotation accuracy and reprojection error, while delivering consistent image recovery. The method promises robust, camera-parameter-free calibration for perception tasks in urban environments, with potential extensions to indoor scenes and multi-view inputs.
Abstract
A Manhattan world lying along cuboid buildings is useful for camera angle estimation. However, accurate and robust angle estimation from fisheye images in the Manhattan world has remained an open challenge because general scene images tend to lack constraints such as lines, arcs, and vanishing points. To achieve higher accuracy and robustness, we propose a learning-based calibration method that uses heatmap regression, which is similar to pose estimation using keypoints, to detect the directions of labeled image coordinates. Simultaneously, our two estimators recover the rotation and remove fisheye distortion by remapping from a general scene image. Without considering vanishing-point constraints, we find that additional points for learning-based methods can be defined. To compensate for the lack of vanishing points in images, we introduce auxiliary diagonal points that have the optimal 3D arrangement of spatial uniformity. Extensive experiments demonstrated that our method outperforms conventional methods on large-scale datasets and with off-the-shelf cameras.
