Table of Contents
Fetching ...

Finding Pre-Injury Patterns in Triathletes from Lifestyle, Recovery and Load Dynamics Features

Leonardo Rossi, Bruno Rodrigues

TL;DR

This work tackles the high risk of overuse injuries in triathletes by recognizing that injury risk arises from a combination of load, recovery, sleep, and lifestyle factors, not just training volume. It introduces a synthetic data generation framework that simulates physiologically plausible athletes, periodized training plans, daily wearable-derived signals, and structured injury patterns to enable context-aware injury prediction. Evaluations of LASSO, Random Forest, and XGBoost on the synthetic data show AUCs up to about 0.86 and highlight sleep disturbances, HRV, and stress as early indicators, demonstrating the framework’s potential to overcome real-world data limitations. The work also discusses a practical deployment pathway and acknowledges limitations when transferring synthetic insights to real-world athletes, aiming to progressively validate models with actual data over time.

Abstract

Triathlon training, which involves high-volume swimming, cycling, and running, places athletes at substantial risk for overuse injuries due to repetitive physiological stress. Current injury prediction approaches primarily rely on training load metrics, often neglecting critical factors such as sleep quality, stress, and individual lifestyle patterns that significantly influence recovery and injury susceptibility. We introduce a novel synthetic data generation framework tailored explicitly for triathlon. This framework generates physiologically plausible athlete profiles, simulates individualized training programs that incorporate periodization and load-management principles, and integrates daily-life factors such as sleep quality, stress levels, and recovery states. We evaluated machine learning models (LASSO, Random Forest, and XGBoost) showing high predictive performance (AUC up to 0.86), identifying sleep disturbances, heart rate variability, and stress as critical early indicators of injury risk. This wearable-driven approach not only enhances injury prediction accuracy but also provides a practical solution to overcoming real-world data limitations, offering a pathway toward a holistic, context-aware athlete monitoring.

Finding Pre-Injury Patterns in Triathletes from Lifestyle, Recovery and Load Dynamics Features

TL;DR

This work tackles the high risk of overuse injuries in triathletes by recognizing that injury risk arises from a combination of load, recovery, sleep, and lifestyle factors, not just training volume. It introduces a synthetic data generation framework that simulates physiologically plausible athletes, periodized training plans, daily wearable-derived signals, and structured injury patterns to enable context-aware injury prediction. Evaluations of LASSO, Random Forest, and XGBoost on the synthetic data show AUCs up to about 0.86 and highlight sleep disturbances, HRV, and stress as early indicators, demonstrating the framework’s potential to overcome real-world data limitations. The work also discusses a practical deployment pathway and acknowledges limitations when transferring synthetic insights to real-world athletes, aiming to progressively validate models with actual data over time.

Abstract

Triathlon training, which involves high-volume swimming, cycling, and running, places athletes at substantial risk for overuse injuries due to repetitive physiological stress. Current injury prediction approaches primarily rely on training load metrics, often neglecting critical factors such as sleep quality, stress, and individual lifestyle patterns that significantly influence recovery and injury susceptibility. We introduce a novel synthetic data generation framework tailored explicitly for triathlon. This framework generates physiologically plausible athlete profiles, simulates individualized training programs that incorporate periodization and load-management principles, and integrates daily-life factors such as sleep quality, stress levels, and recovery states. We evaluated machine learning models (LASSO, Random Forest, and XGBoost) showing high predictive performance (AUC up to 0.86), identifying sleep disturbances, heart rate variability, and stress as critical early indicators of injury risk. This wearable-driven approach not only enhances injury prediction accuracy but also provides a practical solution to overcoming real-world data limitations, offering a pathway toward a holistic, context-aware athlete monitoring.

Paper Structure

This paper contains 27 sections, 2 equations, 11 figures, 4 tables.

Figures (11)

  • Figure 1: Methodology overview highlighting (A) synthetic data generation and (B) ML analysis stages.
  • Figure 2: Data generation pipeline detailing morning assessment, workout execution, and recovery modeling.
  • Figure 3: Training periodization across base, build, peak, and taper phases with controlled load increments.
  • Figure 4: Injury pattern injection mechanism simulating degradation in recovery metrics prior to labeled injury events.
  • Figure 5: Distribution of Key Physiological Metrics in Competitive Age-Group Triathlete Population. Top left: HRV; top-right: RHR; bottom-left: sleep quality; bottom-right: stress score.
  • ...and 6 more figures