Table of Contents
Fetching ...

Interpretable Pre-Release Baseball Pitch Type Anticipation from Broadcast 3D Kinematics

Jerrin Bright, Michelle Lu, John Zelek

TL;DR

This pipeline chains a diffusion-based 3D pose backbone with automatic pitching-event detection, groundtruth-validated biomechanical feature extraction, and gradient-boosted classification over 229 kinematic features, establishing an empirical ceiling near 80% and delineating where kinematic information ends and ball-flight information begins.

Abstract

How much can a pitcher's body reveal about the upcoming pitch? We study this question at scale by classifying eight pitch types from monocular 3D pose sequences, without access to ball-flight data. Our pipeline chains a diffusion-based 3D pose backbone with automatic pitching-event detection, groundtruth-validated biomechanical feature extraction, and gradient-boosted classification over 229 kinematic features. Evaluated on 119,561 professional pitches, the largest such benchmark to date, we achieve 80.4\% accuracy using body kinematics alone. A systematic importance analysis reveals that upper-body mechanics contribute 64.9\% of the predictive signal versus 35.1\% for the lower body, with wrist position (14.8\%) and trunk lateral tilt emerging as the most informative joint group and biomechanical feature, respectively. We further show that grip-defined variants (four-seam vs.\ two-seam fastball) are not separable from pose, establishing an empirical ceiling near 80\% and delineating where kinematic information ends and ball-flight information begins.

Interpretable Pre-Release Baseball Pitch Type Anticipation from Broadcast 3D Kinematics

TL;DR

This pipeline chains a diffusion-based 3D pose backbone with automatic pitching-event detection, groundtruth-validated biomechanical feature extraction, and gradient-boosted classification over 229 kinematic features, establishing an empirical ceiling near 80% and delineating where kinematic information ends and ball-flight information begins.

Abstract

How much can a pitcher's body reveal about the upcoming pitch? We study this question at scale by classifying eight pitch types from monocular 3D pose sequences, without access to ball-flight data. Our pipeline chains a diffusion-based 3D pose backbone with automatic pitching-event detection, groundtruth-validated biomechanical feature extraction, and gradient-boosted classification over 229 kinematic features. Evaluated on 119,561 professional pitches, the largest such benchmark to date, we achieve 80.4\% accuracy using body kinematics alone. A systematic importance analysis reveals that upper-body mechanics contribute 64.9\% of the predictive signal versus 35.1\% for the lower body, with wrist position (14.8\%) and trunk lateral tilt emerging as the most informative joint group and biomechanical feature, respectively. We further show that grip-defined variants (four-seam vs.\ two-seam fastball) are not separable from pose, establishing an empirical ceiling near 80\% and delineating where kinematic information ends and ball-flight information begins.
Paper Structure (35 sections, 5 equations, 5 figures, 7 tables)

This paper contains 35 sections, 5 equations, 5 figures, 7 tables.

Figures (5)

  • Figure 1: Overview of the proposed system. The input broadcast video is first processed by a 2D pose estimator xu2023vitpose++ and a VLM team2023gemini to obtain the action prompt and per-frame 2D poses. These are passed to DreamPose3D bright2025dreampose3d to reconstruct the pitcher’s 3D pose sequence. From the 3D poses, we perform handedness estimation and event localization, and use these features together with the 3D pose sequence to classify the pitch type.
  • Figure 2: Feature importance decomposition across four complementary views: (a) top-10 individual features ranked by XGBoost gain, (b) joint-level importance aggregated over all features per joint, (c) feature category breakdown showing the relative contribution of raw poses, biomechanical metrics, temporal deltas, and handedness, and (d) anatomical body-part grouping with upper- vs. lower-body totals. All panels share a consistent blue palette; darker shades indicate higher importance or upper-body / biomech associations, lighter shades indicate lower-body or raw-pose features.
  • Figure 3: Normalized confusion matrix (row-normalized; values are per-class recall in %). Diagonal cells indicate correct classification rates per pitch type; off-diagonal cells indicate systematic confusions. Red boxes highlight the dominant fastball ambiguity (FF$\leftrightarrow$FT).
  • Figure 4: Feature importance aggregated by pitching event. All three events contribute within a 5% range, indicating that the full delivery, not just the release point, encodes pitch-type information.
  • Figure 5: Hand pose estimation with ball interaction. As future work, we plan to incorporate hand pose estimation in conjunction with the interaction object (baseball) to capture grip-specific contextual cues, enabling more analysis of pitch-type indicators.