Table of Contents
Fetching ...

Video-based Sequential Bayesian Homography Estimation for Soccer Field Registration

Paul J. Claasen, J. P. de Villiers

TL;DR

The paper addresses soccer-field registration by estimating the frame-to-frame homography within a Bayesian framework that explicitly models keypoint uncertainty and frame motion via an affine transform. It introduces BHITK, a two-stage Kalman filtering approach that combines linear keypoint filtering with a non-linear EKF on the extended state including the homography and field points, using measurements derived from tracked keypoints. The authors demonstrate that augmenting existing keypoint detectors with BHITK yields substantial improvements across homography and keypoint metrics on WC14, TS-WorldCup, and the newly refined CARWC dataset, often outperforming substantially more expensive deep networks. The work provides not only improved performance but also public release of refined datasets and a tool for homography annotation, offering a practical, scalable path for real-time game overlays, analytics, and augmented reality in sports broadcasting.

Abstract

A novel Bayesian framework is proposed, which explicitly relates the homography of one video frame to the next through an affine transformation while explicitly modelling keypoint uncertainty. The literature has previously used differential homography between subsequent frames, but not in a Bayesian setting. In cases where Bayesian methods have been applied, camera motion is not adequately modelled, and keypoints are treated as deterministic. The proposed method, Bayesian Homography Inference from Tracked Keypoints (BHITK), employs a two-stage Kalman filter and significantly improves existing methods. Existing keypoint detection methods may be easily augmented with BHITK. It enables less sophisticated and less computationally expensive methods to outperform the state-of-the-art approaches in most homography evaluation metrics. Furthermore, the homography annotations of the WorldCup and TS-WorldCup datasets have been refined using a custom homography annotation tool that has been released for public use. The refined datasets are consolidated and released as the consolidated and refined WorldCup (CARWC) dataset.

Video-based Sequential Bayesian Homography Estimation for Soccer Field Registration

TL;DR

The paper addresses soccer-field registration by estimating the frame-to-frame homography within a Bayesian framework that explicitly models keypoint uncertainty and frame motion via an affine transform. It introduces BHITK, a two-stage Kalman filtering approach that combines linear keypoint filtering with a non-linear EKF on the extended state including the homography and field points, using measurements derived from tracked keypoints. The authors demonstrate that augmenting existing keypoint detectors with BHITK yields substantial improvements across homography and keypoint metrics on WC14, TS-WorldCup, and the newly refined CARWC dataset, often outperforming substantially more expensive deep networks. The work provides not only improved performance but also public release of refined datasets and a tool for homography annotation, offering a practical, scalable path for real-time game overlays, analytics, and augmented reality in sports broadcasting.

Abstract

A novel Bayesian framework is proposed, which explicitly relates the homography of one video frame to the next through an affine transformation while explicitly modelling keypoint uncertainty. The literature has previously used differential homography between subsequent frames, but not in a Bayesian setting. In cases where Bayesian methods have been applied, camera motion is not adequately modelled, and keypoints are treated as deterministic. The proposed method, Bayesian Homography Inference from Tracked Keypoints (BHITK), employs a two-stage Kalman filter and significantly improves existing methods. Existing keypoint detection methods may be easily augmented with BHITK. It enables less sophisticated and less computationally expensive methods to outperform the state-of-the-art approaches in most homography evaluation metrics. Furthermore, the homography annotations of the WorldCup and TS-WorldCup datasets have been refined using a custom homography annotation tool that has been released for public use. The refined datasets are consolidated and released as the consolidated and refined WorldCup (CARWC) dataset.
Paper Structure (34 sections, 29 equations, 6 figures, 5 tables)

This paper contains 34 sections, 29 equations, 6 figures, 5 tables.

Figures (6)

  • Figure 1: Implemented Kalman filter framework. The first stage filters measured keypoint positions $\mathbf{y}^{I}_t$ according to the estimated affine transformation $\mathbf{\hat{A}}_t$. The EKF makes use of the filtered positions, $\mathbf{\hat{x}}^{I}_t$, and the estimated affine transformation to infer an estimate of the homography, $\mathbf{\hat{H}}_t$.
  • Figure 2: Re-projected keypoints using WC14 and custom (CARWC) annotated homographies. The re-projection error of the WC14 annotation is most noticeable when considering the alignment with the right-most grass band.
  • Figure 3: Re-projected keypoints using TSWC and custom (CARWC) annotated homographies. The re-projection error of the TSWC annotation is most noticeable when considering the alignment with the bottom length-wise horizontal field line.
  • Figure 4: Keypoint (KP) measurement error distribution in the x- and y dimensions with the baseline model of robust, trained and evaluated on the CARWC training set.
  • Figure 5: This figure depicts both the re-projected keypoints within the image and the image projected onto the field template. The projection and re-projection in each sub-figure utilise distinct homography estimates: one derived from robust's method and the other obtained by augmenting this same network with the proposed method (BHITK). The sub-figures represent the same frame from the same test video. The red circles represent the keypoints re-projected using the predicted homography, and the green circles represent the keypoints re-projected using the ground truth homography.
  • ...and 1 more figures