BoxingVI: A Multi-Modal Benchmark for Boxing Action Recognition and Localization

Rahul Kumar; Vipul Baghel; Sudhanshu Singh; Bikash Kumar Badatya; Shivam Yadav; Babji Srinivasan; Ravi Hegde

BoxingVI: A Multi-Modal Benchmark for Boxing Action Recognition and Localization

Rahul Kumar, Vipul Baghel, Sudhanshu Singh, Bikash Kumar Badatya, Shivam Yadav, Babji Srinivasan, Ravi Hegde

TL;DR

BoxingVI addresses the scarcity of realistic, annotated boxing data for vision-based action understanding. It introduces 6,915 temporally segmented punch clips across six punch types, drawn from 20 unedited YouTube sessions, with 2D pose trajectories and per-clip labels to support temporal localization and pose-conditioned recognition in monocular RGB video. The dataset uses 15 training and 5 validation subjects across 18 athletes and provides frame-level punch boundaries, enabling robust evaluation under unconstrained conditions and enabling applications in automated coaching, performance assessment, and digital-twin development. By aligning temporal, spatial, and semantic annotations in real-world boxing footage, BoxingVI offers a foundation for future extensions to multi-person interactions and cross-discipline combat analytics.

Abstract

Accurate analysis of combat sports using computer vision has gained traction in recent years, yet the development of robust datasets remains a major bottleneck due to the dynamic, unstructured nature of actions and variations in recording environments. In this work, we present a comprehensive, well-annotated video dataset tailored for punch detection and classification in boxing. The dataset comprises 6,915 high-quality punch clips categorized into six distinct punch types, extracted from 20 publicly available YouTube sparring sessions and involving 18 different athletes. Each clip is manually segmented and labeled to ensure precise temporal boundaries and class consistency, capturing a wide range of motion styles, camera angles, and athlete physiques. This dataset is specifically curated to support research in real-time vision-based action recognition, especially in low-resource and unconstrained environments. By providing a rich benchmark with diverse punch examples, this contribution aims to accelerate progress in movement analysis, automated coaching, and performance assessment within boxing and related domains.

BoxingVI: A Multi-Modal Benchmark for Boxing Action Recognition and Localization

TL;DR

Abstract

BoxingVI: A Multi-Modal Benchmark for Boxing Action Recognition and Localization

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)