Finding Pre-Injury Patterns in Triathletes from Lifestyle, Recovery and Load Dynamics Features
Leonardo Rossi, Bruno Rodrigues
TL;DR
This work tackles the high risk of overuse injuries in triathletes by recognizing that injury risk arises from a combination of load, recovery, sleep, and lifestyle factors, not just training volume. It introduces a synthetic data generation framework that simulates physiologically plausible athletes, periodized training plans, daily wearable-derived signals, and structured injury patterns to enable context-aware injury prediction. Evaluations of LASSO, Random Forest, and XGBoost on the synthetic data show AUCs up to about 0.86 and highlight sleep disturbances, HRV, and stress as early indicators, demonstrating the framework’s potential to overcome real-world data limitations. The work also discusses a practical deployment pathway and acknowledges limitations when transferring synthetic insights to real-world athletes, aiming to progressively validate models with actual data over time.
Abstract
Triathlon training, which involves high-volume swimming, cycling, and running, places athletes at substantial risk for overuse injuries due to repetitive physiological stress. Current injury prediction approaches primarily rely on training load metrics, often neglecting critical factors such as sleep quality, stress, and individual lifestyle patterns that significantly influence recovery and injury susceptibility. We introduce a novel synthetic data generation framework tailored explicitly for triathlon. This framework generates physiologically plausible athlete profiles, simulates individualized training programs that incorporate periodization and load-management principles, and integrates daily-life factors such as sleep quality, stress levels, and recovery states. We evaluated machine learning models (LASSO, Random Forest, and XGBoost) showing high predictive performance (AUC up to 0.86), identifying sleep disturbances, heart rate variability, and stress as critical early indicators of injury risk. This wearable-driven approach not only enhances injury prediction accuracy but also provides a practical solution to overcoming real-world data limitations, offering a pathway toward a holistic, context-aware athlete monitoring.
