Table of Contents
Fetching ...

PitcherNet: Powering the Moneyball Evolution in Baseball Video Analytics

Jerrin Bright, Bavesh Balaji, Yuhao Chen, David A Clausi, John S Zelek

TL;DR

PitcherNet tackles the challenge of extracting pitcher kinematics and pitch statistics from live broadcast video by integrating robust 3D human modeling with a kinematic-driven analysis pipeline. It decouples action from tracklets for reliable pitcher identification, uses D2A-HMR 2.0 with a Depth Anything encoder to estimate 3D pose, and derives statistics such as pitch position, release point, velocity, and release extension from the kinematic data. The approach achieves state-of-the-art performance on MLBPitchDB, including high pitcher-tracklet identification accuracy and strong 3D pose metrics, aided by depth-based improvements and pseudo-ground-truth data. This work enables real-time, data-driven baseball analytics for coaching, strategy, injury prevention, and deeper biomechanical understanding of pitching mechanics in live-game settings.

Abstract

In the high-stakes world of baseball, every nuance of a pitcher's mechanics holds the key to maximizing performance and minimizing runs. Traditional analysis methods often rely on pre-recorded offline numerical data, hindering their application in the dynamic environment of live games. Broadcast video analysis, while seemingly ideal, faces significant challenges due to factors like motion blur and low resolution. To address these challenges, we introduce PitcherNet, an end-to-end automated system that analyzes pitcher kinematics directly from live broadcast video, thereby extracting valuable pitch statistics including velocity, release point, pitch position, and release extension. This system leverages three key components: (1) Player tracking and identification by decoupling actions from player kinematics; (2) Distribution and depth-aware 3D human modeling; and (3) Kinematic-driven pitch statistics. Experimental validation demonstrates that PitcherNet achieves robust analysis results with 96.82% accuracy in pitcher tracklet identification, reduced joint position error by 1.8mm and superior analytics compared to baseline methods. By enabling performance-critical kinematic analysis from broadcast video, PitcherNet paves the way for the future of baseball analytics by optimizing pitching strategies, preventing injuries, and unlocking a deeper understanding of pitcher mechanics, forever transforming the game.

PitcherNet: Powering the Moneyball Evolution in Baseball Video Analytics

TL;DR

PitcherNet tackles the challenge of extracting pitcher kinematics and pitch statistics from live broadcast video by integrating robust 3D human modeling with a kinematic-driven analysis pipeline. It decouples action from tracklets for reliable pitcher identification, uses D2A-HMR 2.0 with a Depth Anything encoder to estimate 3D pose, and derives statistics such as pitch position, release point, velocity, and release extension from the kinematic data. The approach achieves state-of-the-art performance on MLBPitchDB, including high pitcher-tracklet identification accuracy and strong 3D pose metrics, aided by depth-based improvements and pseudo-ground-truth data. This work enables real-time, data-driven baseball analytics for coaching, strategy, injury prevention, and deeper biomechanical understanding of pitching mechanics in live-game settings.

Abstract

In the high-stakes world of baseball, every nuance of a pitcher's mechanics holds the key to maximizing performance and minimizing runs. Traditional analysis methods often rely on pre-recorded offline numerical data, hindering their application in the dynamic environment of live games. Broadcast video analysis, while seemingly ideal, faces significant challenges due to factors like motion blur and low resolution. To address these challenges, we introduce PitcherNet, an end-to-end automated system that analyzes pitcher kinematics directly from live broadcast video, thereby extracting valuable pitch statistics including velocity, release point, pitch position, and release extension. This system leverages three key components: (1) Player tracking and identification by decoupling actions from player kinematics; (2) Distribution and depth-aware 3D human modeling; and (3) Kinematic-driven pitch statistics. Experimental validation demonstrates that PitcherNet achieves robust analysis results with 96.82% accuracy in pitcher tracklet identification, reduced joint position error by 1.8mm and superior analytics compared to baseline methods. By enabling performance-critical kinematic analysis from broadcast video, PitcherNet paves the way for the future of baseball analytics by optimizing pitching strategies, preventing injuries, and unlocking a deeper understanding of pitcher mechanics, forever transforming the game.
Paper Structure (33 sections, 6 equations, 10 figures, 6 tables, 1 algorithm)

This paper contains 33 sections, 6 equations, 10 figures, 6 tables, 1 algorithm.

Figures (10)

  • Figure 1: 3D player reconstruction and kinematic-driven pitch statistics from monocular video. We introduce PitcherNet, a pioneering deep learning system that tackles low-resolution video limitations through efficient 3D human modeling for robust player alignment (left) and reliable pitch statistics analysis from estimated kinematic data (right).
  • Figure 2: Overall architecture. Given a broadcast video, we begin by extracting player tracklets, denoted as $\mathcal{T} \in \{\mathcal{T}_1, \mathcal{T}_2, ..., \mathcal{T}_n\}$. Each tracklet $\mathcal{T}_k$ consists of a sequence of frames $\mathbf{F}_i$ where $\mathbf{F}_i \in \mathbb{R}^{H \times W \times3}$ for $N$ frames. These tracklets are then processed through a Temporal Convolutional Network (TCN), which implicitly decouples player actions and identifies the tracklet of the pitcher, called $\mathcal{T}_p$. Subsequently, $\mathcal{T}_p$ undergoes encoding via an encoder (E) to derive pseudo-depth information for each frame. The frames, along with their corresponding pseudo-depth data, are fed into a 3D modeling technique (D2A-HMR 2.0). This framework is responsible for predicting the 3D mesh and 3D joint positions of the pitcher, facilitating detailed analysis of various pitch metrics using the temporal kinematic information processing the 3D joint positions.
  • Figure 3: Temporal Convolutional Network.(a) Overview of the proposed TCN for the player identification task, where $\text{fc}$ denotes fully connected layers and $p$ refers to the model's output. (b) Architecture of the TConv block used in the TCN, where $C_{in}$, $C_{out}$ and $C_{prev}$ denotes the input, output and previous channels, respectively and $k$ denotes the kernel size.
  • Figure 4: Data Augmentation Technique. Pseudo-ground truth pose is collected using a Transformer model for improved generalizability of the pose estimation model.
  • Figure 5: Trajectory of the right wrist joint in 3D space. Illustration of two frames which correspond to the points (A and B) marked in the trajectory plot that determines the release point.
  • ...and 5 more figures