Integrating Unsupervised and Supervised Learning for the Prediction of Defensive Schemes in American football
Rouven Michels, Robert Bajons, Jan-Ole Fischer
TL;DR
This work tackles the challenge of predicting NFL defensive schemes (man vs zone) from pre-snap motion and tracking data by coupling an unsupervised non-homogeneous hidden Markov model with supervised learners. The HMM infers latent defender–offense guard assignments across motion, producing features such as switch counts and entropy that feed elastic net and XGBoost classifiers, yielding improved prediction accuracy and significant associations with coverage outcomes via the Generalized Covariance Measure. Key contributions include the integration of random effects in the HMM, a data-driven lag selection for defender reaction time, and a rigorous out-of-sample evaluation demonstrating the practical value of latent-state features for understanding defensive behavior and motion-based offense advantages. The framework is modular, enabling extensions to finer-grained coverages and potential neural-network integrations, with implications for broader analytics in team sports.
Abstract
Anticipating defensive coverage schemes is a crucial yet challenging task for offenses in American football. Because defenders' assignments are intentionally disguised before the snap, they remain difficult to recognize in real time. To address this challenge, we develop a statistical framework that integrates supervised and unsupervised learning using player tracking data. Our goal is to forecast the defensive coverage scheme -- man or zone -- through elastic net logistic regression and gradient-boosted decision trees with incrementally derived features. We first use features from the pre-motion situation, then incorporate players' trajectories during motion in a naive way, and finally include features derived from a hidden Markov model (HMM). Based on player movements, the non-homogeneous HMM infers latent defensive assignments between offensive and defensive players during motion and transforms decoded state sequences into informative features for the supervised models. These HMM-based features enhance predictive performance and are significantly associated with coverage outcomes. Moreover, estimated random effects offer interpretable insights into how different defenses and positions adjust their coverage responsibilities.
