Table of Contents
Fetching ...

Neural Real-Time Recalibration for Infrared Multi-Camera Systems

Benyamin Mehmandar, Reza Talakoob, Charalambos Poullis

TL;DR

Through rigorous experimentation, it is demonstrated the proposed neural network-based calibration method is more accurate than traditional calibration techniques with or without perturbations while also being real-time, marking a significant leap in the field of real-time multi-camera system calibration.

Abstract

Currently, there are no learning-free or neural techniques for real-time recalibration of infrared multi-camera systems. In this paper, we address the challenge of real-time, highly-accurate calibration of multi-camera infrared systems, a critical task for time-sensitive applications. Unlike traditional calibration techniques that lack adaptability and struggle with on-the-fly recalibrations, we propose a neural network-based method capable of dynamic real-time calibration. The proposed method integrates a differentiable projection model that directly correlates 3D geometries with their 2D image projections and facilitates the direct optimization of both intrinsic and extrinsic camera parameters. Key to our approach is the dynamic camera pose synthesis with perturbations in camera parameters, emulating realistic operational challenges to enhance model robustness. We introduce two model variants: one designed for multi-camera systems with onboard processing of 2D points, utilizing the direct 2D projections of 3D fiducials, and another for image-based systems, employing color-coded projected points for implicitly establishing correspondence. Through rigorous experimentation, we demonstrate our method is more accurate than traditional calibration techniques with or without perturbations while also being real-time, marking a significant leap in the field of real-time multi-camera system calibration. The source code can be found at https://github.com/theICTlab/neural-recalibration

Neural Real-Time Recalibration for Infrared Multi-Camera Systems

TL;DR

Through rigorous experimentation, it is demonstrated the proposed neural network-based calibration method is more accurate than traditional calibration techniques with or without perturbations while also being real-time, marking a significant leap in the field of real-time multi-camera system calibration.

Abstract

Currently, there are no learning-free or neural techniques for real-time recalibration of infrared multi-camera systems. In this paper, we address the challenge of real-time, highly-accurate calibration of multi-camera infrared systems, a critical task for time-sensitive applications. Unlike traditional calibration techniques that lack adaptability and struggle with on-the-fly recalibrations, we propose a neural network-based method capable of dynamic real-time calibration. The proposed method integrates a differentiable projection model that directly correlates 3D geometries with their 2D image projections and facilitates the direct optimization of both intrinsic and extrinsic camera parameters. Key to our approach is the dynamic camera pose synthesis with perturbations in camera parameters, emulating realistic operational challenges to enhance model robustness. We introduce two model variants: one designed for multi-camera systems with onboard processing of 2D points, utilizing the direct 2D projections of 3D fiducials, and another for image-based systems, employing color-coded projected points for implicitly establishing correspondence. Through rigorous experimentation, we demonstrate our method is more accurate than traditional calibration techniques with or without perturbations while also being real-time, marking a significant leap in the field of real-time multi-camera system calibration. The source code can be found at https://github.com/theICTlab/neural-recalibration

Paper Structure

This paper contains 33 sections, 4 equations, 9 figures, 2 tables.

Figures (9)

  • Figure 1: Discontinuities introduced by quaternions and Euler angle representations hinder network learning efficiency as shown in Zhou_2019_CVPR.
  • Figure 2: Technical overview. Our methodology begins with the synthesis of dynamic camera poses (see top fig.). Given spherical angles $\phi$ (azimuth), $\theta$ (elevation), along with the intrinsic rotation angle $\alpha$, the OEM calibration parameters, the maximum perturbation limit $\kappa$, and known 3D fiducials (e.g. a cube calibration object), this module performs two primary functions: (i) it synthesizes poses for the multi-camera system, and (ii) it computes the projected 2D points. Subsequently, it employs point splatting to render images of these points. During training (see bottom fig.), the synthesizes poses and projected points (alternatively rendered images) are used to train the neural network. A differentiable projection ensures the propagation of gradients from the loss $\mathcal{L}$ back to the predicted camera parameters.
  • Figure 3: Dynamic Camera Pose Synthesis. Our framework supports arbitrary configurations of multiple cameras as well as a wide range of calibration objects. To synthesize camera poses, we employ a random uniform sampling strategy across three dimensions to ensure a comprehensive exploration of the pose space: azimuth ($\theta$), elevation ($\phi$), and roll ($\alpha$), where $\theta \sim \mathcal{U}(0, 2\pi)$, $\phi \sim \mathcal{U}(0, \frac{\pi}{2})$, and $\alpha \sim \mathcal{U}(0, 2\pi)$. Additionally, Original Equipment Manufacturer (OEM) calibration parameters and a predefined maximum perturbation limit ($\kappa$) are incorporated.
  • Figure 4: Visualization of predicted vs ground truth camera poses. The calibration object is a sphere with 64 fiducials. The multi-camera system configuration is O-shaped comprising 10 cameras. For closer inspection please refer to the interactive visualization in the cameras.html file.
  • Figure 5: Runtime for traditional camera calibration. Exponential growth in calibration time with increasing number of images (red; 1 camera). Linear increase w.r.t. LM iterations on 100 images (blue; 1 camera). Ours; real-time ($\tau_{min}^{i=1}=0.0026s$, $\tau_{max}^{i=10}=0.012s$) for increasing number of cameras $2^{i}, 1 \leq i \leq 10$ (black).
  • ...and 4 more figures