Table of Contents
Fetching ...

Modern Deep Learning Approaches for Cricket Shot Classification: A Comprehensive Baseline Study

Sungwoo Kang

TL;DR

This study tackles cricket shot classification from video by benchmarking seven deep learning approaches across four paradigms on the CricShot10 dataset. It introduces a state-of-the-art EfficientNet-B0 + GRU model with temporal attention and Optuna-based optimization, achieving 92.25% accuracy, while revealing substantial reproducibility gaps between published results and reproduced benchmarks. The authors provide a fully open-source, PyTorch Lightning-based framework to enable standardized evaluation and reproducible research in sports video analysis. The findings underscore the importance of consistent benchmarks and careful evaluation protocols to realize robust, production-ready cricket shot analysis, with implications for broadcasting, coaching, and fan engagement. Overall, modern architectures, when properly optimized under standardized protocols, surpass earlier baselines and demonstrate the value of reproducible baselines in sports AI.

Abstract

Cricket shot classification from video sequences remains a challenging problem in sports video analysis, requiring effective modeling of both spatial and temporal features. This paper presents the first comprehensive baseline study comparing seven different deep learning approaches across four distinct research paradigms for cricket shot classification. We implement and systematically evaluate traditional CNN-LSTM architectures, attention-based models, vision transformers, transfer learning approaches, and modern EfficientNet-GRU combinations on a unified benchmark. A critical finding of our study is the significant performance gap between claims in academic literature and practical implementation results. While previous papers reported accuracies of 96\% (Balaji LRCN), 99.2\% (IJERCSE), and 93\% (Sensors), our standardized re-implementations achieve 46.0\%, 55.6\%, and 57.7\% respectively. Our modern SOTA approach, combining EfficientNet-B0 with a GRU-based temporal model, achieves 92.25\% accuracy, demonstrating that substantial improvements are possible with modern architectures and systematic optimization. All implementations follow modern MLOps practices with PyTorch Lightning, providing a reproducible research platform that exposes the critical importance of standardized evaluation protocols in sports video analysis research.

Modern Deep Learning Approaches for Cricket Shot Classification: A Comprehensive Baseline Study

TL;DR

This study tackles cricket shot classification from video by benchmarking seven deep learning approaches across four paradigms on the CricShot10 dataset. It introduces a state-of-the-art EfficientNet-B0 + GRU model with temporal attention and Optuna-based optimization, achieving 92.25% accuracy, while revealing substantial reproducibility gaps between published results and reproduced benchmarks. The authors provide a fully open-source, PyTorch Lightning-based framework to enable standardized evaluation and reproducible research in sports video analysis. The findings underscore the importance of consistent benchmarks and careful evaluation protocols to realize robust, production-ready cricket shot analysis, with implications for broadcasting, coaching, and fan engagement. Overall, modern architectures, when properly optimized under standardized protocols, surpass earlier baselines and demonstrate the value of reproducible baselines in sports AI.

Abstract

Cricket shot classification from video sequences remains a challenging problem in sports video analysis, requiring effective modeling of both spatial and temporal features. This paper presents the first comprehensive baseline study comparing seven different deep learning approaches across four distinct research paradigms for cricket shot classification. We implement and systematically evaluate traditional CNN-LSTM architectures, attention-based models, vision transformers, transfer learning approaches, and modern EfficientNet-GRU combinations on a unified benchmark. A critical finding of our study is the significant performance gap between claims in academic literature and practical implementation results. While previous papers reported accuracies of 96\% (Balaji LRCN), 99.2\% (IJERCSE), and 93\% (Sensors), our standardized re-implementations achieve 46.0\%, 55.6\%, and 57.7\% respectively. Our modern SOTA approach, combining EfficientNet-B0 with a GRU-based temporal model, achieves 92.25\% accuracy, demonstrating that substantial improvements are possible with modern architectures and systematic optimization. All implementations follow modern MLOps practices with PyTorch Lightning, providing a reproducible research platform that exposes the critical importance of standardized evaluation protocols in sports video analysis research.

Paper Structure

This paper contains 23 sections, 3 tables.