Table of Contents
Fetching ...

ESC: Evolutionary Stitched Camera Calibration in the Wild

Grzegorz Rypeść, Grzegorz Kurzejamski

TL;DR

ESC addresses the challenge of maintaining accurate multi-camera extrinsics on real football fields by combining segmentation of playfield lines with a 3D playfield model and an elitist $\mu + \lambda$ evolutionary optimization. The method jointly refines all camera poses to optimize a composite loss that rewards both stitch quality and projection accuracy, without assuming a flat field. Empirical results show ESC outperforms baselines in stitching quality and pose projection across diverse fields, and ablation confirms the importance of the 3D field model and the stitch loss. This approach enables robust, parallax-aware stitched views suitable for real-time sports broadcasting and analytics.

Abstract

This work introduces a novel end-to-end approach for estimating extrinsic parameters of cameras in multi-camera setups on real-life sports fields. We identify the source of significant calibration errors in multi-camera environments and address the limitations of existing calibration methods, particularly the disparity between theoretical models and actual sports field characteristics. We propose the Evolutionary Stitched Camera calibration (ESC) algorithm to bridge this gap. It consists of image segmentation followed by evolutionary optimization of a novel loss function, providing a unified and accurate multi-camera calibration solution with high visual fidelity. The outcome allows the creation of virtual stitched views from multiple video sources, being as important for practical applications as numerical accuracy. We demonstrate the superior performance of our approach compared to state-of-the-art methods across diverse real-life football fields with varying physical characteristics.

ESC: Evolutionary Stitched Camera Calibration in the Wild

TL;DR

ESC addresses the challenge of maintaining accurate multi-camera extrinsics on real football fields by combining segmentation of playfield lines with a 3D playfield model and an elitist evolutionary optimization. The method jointly refines all camera poses to optimize a composite loss that rewards both stitch quality and projection accuracy, without assuming a flat field. Empirical results show ESC outperforms baselines in stitching quality and pose projection across diverse fields, and ablation confirms the importance of the 3D field model and the stitch loss. This approach enables robust, parallax-aware stitched views suitable for real-time sports broadcasting and analytics.

Abstract

This work introduces a novel end-to-end approach for estimating extrinsic parameters of cameras in multi-camera setups on real-life sports fields. We identify the source of significant calibration errors in multi-camera environments and address the limitations of existing calibration methods, particularly the disparity between theoretical models and actual sports field characteristics. We propose the Evolutionary Stitched Camera calibration (ESC) algorithm to bridge this gap. It consists of image segmentation followed by evolutionary optimization of a novel loss function, providing a unified and accurate multi-camera calibration solution with high visual fidelity. The outcome allows the creation of virtual stitched views from multiple video sources, being as important for practical applications as numerical accuracy. We demonstrate the superior performance of our approach compared to state-of-the-art methods across diverse real-life football fields with varying physical characteristics.
Paper Structure (18 sections, 4 equations, 7 figures, 3 tables)

This paper contains 18 sections, 4 equations, 7 figures, 3 tables.

Figures (7)

  • Figure 1: Calibration results for Zhang's and our method. Our approach is faster and presents better stitch and visual fidelity. This is because the latter does not take the stitch into account and assumes constant environmental conditions, e.g., that distortion coefficients are unaffected by the temperature.
  • Figure 2: We consider a scenario where two cameras are placed behind the middle line of an outdoor football field and were calibrated at time $t$. Due to adversarial environmental conditions such as winds, thermal expansions, or wildlife, their calibrations are not valid at the $t+1$ moment. This causes errors when stitching videos of these cameras. We utilize our novel ESC method to refine the calibrations and improve the quality of the stitched video.
  • Figure 3: Our playfield model consists of four planes. The model approximates real playfields well, allowing us to achieve greater precision than baseline methods, which assume the playfield is flat.
  • Figure 4: Our method of calibrating cameras at time $t$ consists of two phases. In the first one, we segment camera images using a deep convolutional network to find playfield lines. Next, we utilize an elitist $\mu + \lambda$ evolution strategy to find each camera's rotation and translation vectors that minimize the loss function. For $N$ cameras (here $N=2$), the loss function consists of $N+1$ summands: one per camera measuring how well the warped segmented image aligns with the playfield template and a stitch one measuring how well warped images of all cameras are aligned together. The template is represented by white field lines on a black background. To warp images, we utilize projection matrices calculated from individuals.
  • Figure 5: Evaluation of methods at different hours during the day. ESC achieves much lower translation and rotation errors compared to other approaches at any time of the day. This results in better projections and stitch quality.
  • ...and 2 more figures