Table of Contents
Fetching ...

Enhancing Multi-Camera Gymnast Tracking Through Domain Knowledge Integration

Fan Yang, Shigeyuki Odashima, Shoichi Masui, Ikuo Kusajima, Sosuke Yamao, Shan Jiang

TL;DR

This work tackles robust 3D gymnast tracking under limited cross-view observations for gymnastics judging. It introduces a domain-knowledge–driven cascaded data association that switches between triangulation and ray-plane intersection, leveraging the tendency of gymnasts to move within a predefined vertical plane. The approach, built on four calibrated RGB cameras and a multi-stage processing pipeline, shows substantial reductions in ID switches and pose-estimation errors compared with state-of-the-art baselines, especially when only two opposing views are available, and has been deployed at the Gymnastics World Championships. The method promises practical benefits for objective judging and broader sport-video analysis by integrating sport-specific constraints into multi-camera tracking.

Abstract

We present a robust multi-camera gymnast tracking, which has been applied at international gymnastics championships for gymnastics judging. Despite considerable progress in multi-camera tracking algorithms, tracking gymnasts presents unique challenges: (i) due to space restrictions, only a limited number of cameras can be installed in the gymnastics stadium; and (ii) due to variations in lighting, background, uniforms, and occlusions, multi-camera gymnast detection may fail in certain views and only provide valid detections from two opposing views. These factors complicate the accurate determination of a gymnast's 3D trajectory using conventional multi-camera triangulation. To alleviate this issue, we incorporate gymnastics domain knowledge into our tracking solution. Given that a gymnast's 3D center typically lies within a predefined vertical plane during \revised{much of their} performance, we can apply a ray-plane intersection to generate coplanar 3D trajectory candidates for opposing-view detections. More specifically, we propose a novel cascaded data association (DA) paradigm that employs triangulation to generate 3D trajectory candidates when cross-view detections are sufficient, and resort to the ray-plane intersection when they are insufficient. Consequently, coplanar candidates are used to compensate for uncertain trajectories, thereby minimizing tracking failures. The robustness of our method is validated through extensive experimentation, demonstrating its superiority over existing methods in challenging scenarios. Furthermore, our gymnastics judging system, equipped with this tracking method, has been successfully applied to recent Gymnastics World Championships, earning significant recognition from the International Gymnastics Federation.

Enhancing Multi-Camera Gymnast Tracking Through Domain Knowledge Integration

TL;DR

This work tackles robust 3D gymnast tracking under limited cross-view observations for gymnastics judging. It introduces a domain-knowledge–driven cascaded data association that switches between triangulation and ray-plane intersection, leveraging the tendency of gymnasts to move within a predefined vertical plane. The approach, built on four calibrated RGB cameras and a multi-stage processing pipeline, shows substantial reductions in ID switches and pose-estimation errors compared with state-of-the-art baselines, especially when only two opposing views are available, and has been deployed at the Gymnastics World Championships. The method promises practical benefits for objective judging and broader sport-video analysis by integrating sport-specific constraints into multi-camera tracking.

Abstract

We present a robust multi-camera gymnast tracking, which has been applied at international gymnastics championships for gymnastics judging. Despite considerable progress in multi-camera tracking algorithms, tracking gymnasts presents unique challenges: (i) due to space restrictions, only a limited number of cameras can be installed in the gymnastics stadium; and (ii) due to variations in lighting, background, uniforms, and occlusions, multi-camera gymnast detection may fail in certain views and only provide valid detections from two opposing views. These factors complicate the accurate determination of a gymnast's 3D trajectory using conventional multi-camera triangulation. To alleviate this issue, we incorporate gymnastics domain knowledge into our tracking solution. Given that a gymnast's 3D center typically lies within a predefined vertical plane during \revised{much of their} performance, we can apply a ray-plane intersection to generate coplanar 3D trajectory candidates for opposing-view detections. More specifically, we propose a novel cascaded data association (DA) paradigm that employs triangulation to generate 3D trajectory candidates when cross-view detections are sufficient, and resort to the ray-plane intersection when they are insufficient. Consequently, coplanar candidates are used to compensate for uncertain trajectories, thereby minimizing tracking failures. The robustness of our method is validated through extensive experimentation, demonstrating its superiority over existing methods in challenging scenarios. Furthermore, our gymnastics judging system, equipped with this tracking method, has been successfully applied to recent Gymnastics World Championships, earning significant recognition from the International Gymnastics Federation.

Paper Structure

This paper contains 18 sections, 14 equations, 8 figures, 5 tables, 1 algorithm.

Figures (8)

  • Figure 1: Overview of our proposal. While conventional multi-camera multi-target tracking may fail due to limited opposite-view observations, our method incorporates gymnastics domain knowledge to improve tracking robustness.
  • Figure 2: Architecture of our gymnast tracking framework. We segment 2D tracklets into fragments in real time and collaboratively refine them using multi-view information. By incorporating gymnastics domain knowledge, we apply triangulation to generate 3D tracklet candidates when detections are sufficient and resort to ray-plane intersection when they are not. Valid candidates are subsequently fused under a cascaded data association paradigm. Thereafter, we reassemble fragmented tracklets by integrating both 2D and 3D information and identify the tracklet corresponding to the target gymnast. Utilizing the associated multi-view 2D tracklets of the target gymnast, we generate 3D poses and employ them for judging purposes. Note that, while the gray-colored modules are partially derived from the precursor work, UniMMT yang2023unified, the orange-colored modules have been newly developed specifically for gymnast tracking.
  • Figure 3: 3D position estimation. (a) and (b) present a challenging scenario in which only two opposing-view detections are available. While (a) shows the top-view scene, (b) is the corresponding side-view scene. Both (a) and (b) demonstrate that using our ray-to-plane intersection yields better 3D positions compared to using conventional triangulation. (c) illustrates that, while coplanar 3D tracklets are generated for multiple persons, coplanar 3D tracklets of the target gymnast generally have small gaps, allowing them to be grouped together. (d) depicts how 3D tracklet candidates of (c) are matched on the vertical plane.
  • Figure 4: The display of our gymnastics judging support system. Referring to angles within poses, a segment of the 3D pose sequence is assigned to the gymnastics code of points de20172020de2018technicalpalmer2022aesthetics to which it corresponds.
  • Figure 5: Evaluating the suitability of 2D bbox for pose estimation. Whenever the ground-truth bbox length is less than half the $\boldsymbol{b}^{buf}$ or lies outside it, the $\boldsymbol{b}^{buf}$ is considered a failure because it impairs the pose estimation performance.
  • ...and 3 more figures