KabaddiPy: A package to enable access to Professional Kabaddi Data
Bhaskar Lalwani, Aniruddha Mukherjee
TL;DR
KabaddiPy addresses the data scarcity in professional Kabaddi by providing an open-source Python package to aggregate and standardize multi-source PKL data into a single, reproducible dataset. The approach uses web scraping to collect historical data across seasons from PKL, ProKabaddi, and companion sources, with data cleaning and structuring into team, player, and match-level metrics, and a central data repository for ease of access. The work enables downstream analytics, such as evaluating raider efficiency against varying defender counts, zone-based performance, and roster-level insights, and supports reproducible research by preserving data and code. The framework is positioned to expand to additional leagues (World Cup, British League) and to incorporate more granular data (auction data), enabling cross-league comparisons and more robust strategic analyses.
Abstract
Kabaddi, a contact team sport of Indian origin, has seen a dramatic rise in global popularity, highlighted by the upcoming Kabaddi World Cup in 2025 with over sixteen international teams participating, alongside flourishing national leagues such as the Indian Pro Kabaddi League (230 million viewers) and the British Kabaddi League. We present the first open-source Python module to make Kabaddi statistical data easily accessible from multiple scattered sources across the internet. The module was developed by systematically web-scraping and collecting team-wise, player-wise and match-by-match data. The data has been cleaned, organized, and categorized into team overviews and player metrics, each filterable by season. The players are classified as raiders and defenders, with their best strategies for attacking, counter-attacking, and defending against different teams highlighted. Our module enables continuous monitoring of exponentially growing data streams, aiding researchers to quickly start building upon the data to answer critical questions, such as the impact of player inclusion/exclusion on team performance, scoring patterns against specific teams, and break down opponent gameplay. The data generated from Kabaddi tournaments has been sparsely used, and coaches and players rely heavily on intuition to make decisions and craft strategies. Our module can be utilized to build predictive models, craft uniquely strategic gameplays to target opponents and identify hidden correlations in the data. This open source module has the potential to increase time-efficiency, encourage analytical studies of Kabaddi gameplay and player dynamics and foster reproducible research. The data and code are publicly available: https://github.com/kabaddiPy/kabaddiPy
