ViTs for Action Classification in Videos: An Approach to Risky Tackle Detection in American Football Practice Videos

Syed Ahsan Masud Zaidi; William Hsu; Scott Dietrich

ViTs for Action Classification in Videos: An Approach to Risky Tackle Detection in American Football Practice Videos

Syed Ahsan Masud Zaidi, William Hsu, Scott Dietrich

Abstract

Early identification of hazardous actions in contact sports enables timely intervention and improves player safety. We present a method for detecting risky tackles in American football practice videos and introduce a substantially expanded dataset for this task. Our work contains 733 single-athlete-dummy tackle clips, each temporally localized around first point contact and labeled with a strike zone component of the standardized Assessment for Tackling Technique (SATT-3), extending prior work that reported 178 annotated videos. Using a Vision transformer-based model with imbalance-aware training, we obtain risky recall of 0.67 and Risky F1 of 0.59 under crossvalidation. Relative to the previous baseline in a smaller subset (risky recall of 0.58; Risky F1 0.56 ), our approach improves risky recall by more than 8% points on a much larger dataset. These results indicate that the vision transformer-based video analysis, coupled with careful handling of class imbalance, can reliably detect rare but safety-critical tackling patterns, offering a practical pathway toward coach-centered injury prevention tools.

ViTs for Action Classification in Videos: An Approach to Risky Tackle Detection in American Football Practice Videos

Abstract

ViTs for Action Classification in Videos: An Approach to Risky Tackle Detection in American Football Practice Videos

Abstract

Paper Structure

Table of Contents

Figures (6)