Table of Contents
Fetching ...

Visualization of Unstructured Sports Data -- An Example of Cricket Short Text Commentary

Swarup Ranjan Behera, Vijaya V Saradhi

TL;DR

This work addresses the gap where sports visualization predominantly uses structured data by introducing cricket short text commentary as an unstructured data source for visualization. It builds a computational framework using a confrontation matrix and Correspondence Analysis (CA) to extract strength and weakness rules for individual players, visualized with biplots and complemented by t-SNE clustering to reveal similar players. The approach is validated through expert comparison and Procrustes analysis, demonstrating reliable rule extraction and meaningful player groupings, with data and code publicly available. The methodology offers a new, ball-by-ball contextual perspective for analysts, coaches, and teams to augment strategic decision-making in cricket.

Abstract

Sports visualization focuses on the use of structured data, such as box-score data and tracking data. Unstructured data sources pertaining to sports are available in various places such as blogs, social media posts, and online news articles. Sports visualization methods either not fully exploited the information present in these sources or the proposed visualizations through the use of these sources did not augment to the body of sports visualization methods. We propose the use of unstructured data, namely cricket short text commentary for visualization. The short text commentary data is used for constructing individual player's strength rules and weakness rules. A computationally feasible definition for player's strength rule and weakness rule is proposed. A visualization method for the constructed rules is presented. In addition, players having similar strength rules or weakness rules is computed and visualized. We demonstrate the usefulness of short text commentary in visualization by analyzing the strengths and weaknesses of cricket players using more than one million text commentaries. We validate the constructed rules through two validation methods. The collected data, source code, and obtained results on more than 500 players are made publicly available.

Visualization of Unstructured Sports Data -- An Example of Cricket Short Text Commentary

TL;DR

This work addresses the gap where sports visualization predominantly uses structured data by introducing cricket short text commentary as an unstructured data source for visualization. It builds a computational framework using a confrontation matrix and Correspondence Analysis (CA) to extract strength and weakness rules for individual players, visualized with biplots and complemented by t-SNE clustering to reveal similar players. The approach is validated through expert comparison and Procrustes analysis, demonstrating reliable rule extraction and meaningful player groupings, with data and code publicly available. The methodology offers a new, ball-by-ball contextual perspective for analysts, coaches, and teams to augment strategic decision-making in cricket.

Abstract

Sports visualization focuses on the use of structured data, such as box-score data and tracking data. Unstructured data sources pertaining to sports are available in various places such as blogs, social media posts, and online news articles. Sports visualization methods either not fully exploited the information present in these sources or the proposed visualizations through the use of these sources did not augment to the body of sports visualization methods. We propose the use of unstructured data, namely cricket short text commentary for visualization. The short text commentary data is used for constructing individual player's strength rules and weakness rules. A computationally feasible definition for player's strength rule and weakness rule is proposed. A visualization method for the constructed rules is presented. In addition, players having similar strength rules or weakness rules is computed and visualized. We demonstrate the usefulness of short text commentary in visualization by analyzing the strengths and weaknesses of cricket players using more than one million text commentaries. We validate the constructed rules through two validation methods. The collected data, source code, and obtained results on more than 500 players are made publicly available.
Paper Structure (22 sections, 8 figures, 4 tables, 2 algorithms)

This paper contains 22 sections, 8 figures, 4 tables, 2 algorithms.

Figures (8)

  • Figure 1: System Overview.
  • Figure 2: Steps of Confrontation Matrix Construction.
  • Figure 3: Inner Products between $F_{attacked}$ or $F_{beaten}$ and all Bowling Vectors.
  • Figure 4: Smith's Response on Various Deliveries.
  • Figure 5: Visualization of Similar Batsmen based on their Strength Rule using t-SNE Plot.
  • ...and 3 more figures

Theorems & Definitions (5)

  • Definition 1
  • Definition 2
  • Definition 3
  • Definition 4
  • Definition 5