Classification of Tennis Actions Using Deep Learning
Emil Hovad, Therese Hougaard-Jensen, Line Katrine Harder Clemmensen
TL;DR
This work evaluates deep learning for tennis action recognition using the SlowFast architecture on the THETIS RGB dataset, addressing the feasibility and challenges of classifying dense fine-grained actions in limited data. It shows that SlowFast 4x16 achieves about 74% generalization accuracy, outperforming other variants and previous domain-specific results, while providing detailed error analyses that highlight cue loss when the ball is absent. The study identifies key limitations of THETIS RGB—namely the lack of ball trajectories and court positioning—and argues for higher-quality datasets and potential transfer-learning opportunities to advance tennis-video understanding. Overall, the results demonstrate the promise of two-pathway video models for sport action recognition and outline concrete directions for improving dataset quality and evaluation in this domain.
Abstract
Recent advances of deep learning makes it possible to identify specific events in videos with greater precision. This has great relevance in sports like tennis in order to e.g., automatically collect game statistics, or replay actions of specific interest for game strategy or player improvements. In this paper, we investigate the potential and the challenges of using deep learning to classify tennis actions. Three models of different size, all based on the deep learning architecture SlowFast were trained and evaluated on the academic tennis dataset THETIS. The best models achieve a generalization accuracy of 74 %, demonstrating a good performance for tennis action classification. We provide an error analysis for the best model and pinpoint directions for improvement of tennis datasets in general. We discuss the limitations of the data set, general limitations of current publicly available tennis data-sets, and future steps needed to make progress.
