Sports Analysis and VR Viewing System Based on Player Tracking and Pose Estimation with Multimodal and Multiview Sensors

Wenxuan Guo; Zhiyu Pan; Ziheng Xi; Alapati Tuerxun; Jianjiang Feng; Jie Zhou

Sports Analysis and VR Viewing System Based on Player Tracking and Pose Estimation with Multimodal and Multiview Sensors

Wenxuan Guo, Zhiyu Pan, Ziheng Xi, Alapati Tuerxun, Jianjiang Feng, Jie Zhou

TL;DR

The paper tackles the challenge of outdoor, large-scale 3D imaging for sports analysis and immersive VR/AR viewing by introducing a portable, wireless, multi-view, multi-modal system that fuses RGB cameras with LiDAR sensors. It implements GPS-based time synchronization and an automatic spatio-temporal calibration pipeline to enable accurate, multi-node data fusion in outdoor environments, and demonstrates these capabilities on a new THU-MultiLiCa dataset collected across diverse scenes. Through extensive experiments on 3D object detection and multi-object tracking, the study shows that increasing the number of views improves performance, with early fusion strategies yielding the highest accuracy while late fusion offers scalability. The work provides open-source hardware, code, and dataset, highlighting practical impact for outdoor sports analytics and VR/AR visualization, and laying groundwork for robust, real-time perception in large outdoor spaces.

Abstract

Sports analysis and viewing play a pivotal role in the current sports domain, offering significant value not only to coaches and athletes but also to fans and the media. In recent years, the rapid development of virtual reality (VR) and augmented reality (AR) technologies have introduced a new platform for watching games. Visualization of sports competitions in VR/AR represents a revolutionary technology, providing audiences with a novel immersive viewing experience. However, there is still a lack of related research in this area. In this work, we present for the first time a comprehensive system for sports competition analysis and real-time visualization on VR/AR platforms. First, we utilize multiview LiDARs and cameras to collect multimodal game data. Subsequently, we propose a framework for multi-player tracking and pose estimation based on a limited amount of supervised data, which extracts precise player positions and movements from point clouds and images. Moreover, we perform avatar modeling of players to obtain their 3D models. Ultimately, using these 3D player data, we conduct competition analysis and real-time visualization on VR/AR. Extensive quantitative experiments demonstrate the accuracy and robustness of our multi-player tracking and pose estimation framework. The visualization results showcase the immense potential of our sports visualization system on the domain of watching games on VR/AR devices. The multimodal competition dataset we collected and all related code will be released soon.

Sports Analysis and VR Viewing System Based on Player Tracking and Pose Estimation with Multimodal and Multiview Sensors

TL;DR

Abstract

Paper Structure (21 sections, 10 figures, 4 tables)

This paper contains 21 sections, 10 figures, 4 tables.

Introduction
Related Work
Proposed Imaging System
Structural Design
Spatio-temporal Calibration
Portable Node
Outdoor Scenes 3D Dataset
Overview
Acquisition Scenes
Crossroad Scene
Multi-person Scene
System Reliability
Experimental Results
Evaluation Metrics
3D Object Detection
...and 6 more sections

Figures (10)

Figure 1: 3D imaging data collected by our system at a crossroad. The multi-view point cloud, which is colored different shades of blue, is from four different LiDARs. The points of pedestrians are colored orange and points of cars are colored yellow. Four slave nodes are presented by the red rectangular pyramids, RGB images of which are displayed at the top.
Figure 2: The system architecture of our system. Master node controls the entire system via a wireless network. Each modularized slave node utilizes an RGB camera and a Livox LiDAR to capture 3D imaging data.
Figure 3: The designed model of slave node. Two parts can be disassembled and easily screwed together. A camera, a LiDAR, a Nvidia AGX Xavier and a battery are fixedly attached to the assembly body.
Figure 4: Four acquisition scenes of our dataset. (a) is the crossroad scene with many vehicles and pedestrians. (b) is a square with a size of more than 50 meters. (c) and (d) is the multi-person scene in day and at night. In these scenes, we acquired over $10,000$ frames with high-precision synchronization and calibration from four slave nodes of our system.
Figure 4: Tracking results on multi-person dataset.
...and 5 more figures

Sports Analysis and VR Viewing System Based on Player Tracking and Pose Estimation with Multimodal and Multiview Sensors

TL;DR

Abstract

Sports Analysis and VR Viewing System Based on Player Tracking and Pose Estimation with Multimodal and Multiview Sensors

Authors

TL;DR

Abstract

Table of Contents

Figures (10)