Table of Contents
Fetching ...

Sportify: Question Answering with Embedded Visualizations and Personified Narratives for Sports Video

Chunggi Lee, Tica Lin, Hanspeter Pfister, Chen Zhu-Tian

TL;DR

Sportify addresses fan comprehension of basketball tactics by integrating embedded visualizations with LLM-generated narratives in a Visual Question Answering framework. The system detects tactics and actions from video data, renders three action visualizations (Pass, Cut, Screen), and presents narratives from first- or third-person perspectives using a retrieval-augmented generation pipeline with ReAct prompting. Two user studies show that embedded visuals and narratives improve understanding and engagement compared with text alone or YouTube tactic videos, with first-person narration boosting enjoyment and third-person narration providing a sense of control. The work advances on-grounded, interactive tactic explanations for sports videos and highlights opportunities for future multi-modal LLM integration and defense tactic coverage.

Abstract

As basketball's popularity surges, fans often find themselves confused and overwhelmed by the rapid game pace and complexity. Basketball tactics, involving a complex series of actions, require substantial knowledge to be fully understood. This complexity leads to a need for additional information and explanation, which can distract fans from the game. To tackle these challenges, we present Sportify, a Visual Question Answering system that integrates narratives and embedded visualization for demystifying basketball tactical questions, aiding fans in understanding various game aspects. We propose three novel action visualizations (i.e., Pass, Cut, and Screen) to demonstrate critical action sequences. To explain the reasoning and logic behind players' actions, we leverage a large-language model (LLM) to generate narratives. We adopt a storytelling approach for complex scenarios from both first and third-person perspectives, integrating action visualizations. We evaluated Sportify with basketball fans to investigate its impact on understanding of tactics, and how different personal perspectives of narratives impact the understanding of complex tactic with action visualizations. Our evaluation with basketball fans demonstrates Sportify's capability to deepen tactical insights and amplify the viewing experience. Furthermore, third-person narration assists people in getting in-depth game explanations while first-person narration enhances fans' game engagement

Sportify: Question Answering with Embedded Visualizations and Personified Narratives for Sports Video

TL;DR

Sportify addresses fan comprehension of basketball tactics by integrating embedded visualizations with LLM-generated narratives in a Visual Question Answering framework. The system detects tactics and actions from video data, renders three action visualizations (Pass, Cut, Screen), and presents narratives from first- or third-person perspectives using a retrieval-augmented generation pipeline with ReAct prompting. Two user studies show that embedded visuals and narratives improve understanding and engagement compared with text alone or YouTube tactic videos, with first-person narration boosting enjoyment and third-person narration providing a sense of control. The work advances on-grounded, interactive tactic explanations for sports videos and highlights opportunities for future multi-modal LLM integration and defense tactic coverage.

Abstract

As basketball's popularity surges, fans often find themselves confused and overwhelmed by the rapid game pace and complexity. Basketball tactics, involving a complex series of actions, require substantial knowledge to be fully understood. This complexity leads to a need for additional information and explanation, which can distract fans from the game. To tackle these challenges, we present Sportify, a Visual Question Answering system that integrates narratives and embedded visualization for demystifying basketball tactical questions, aiding fans in understanding various game aspects. We propose three novel action visualizations (i.e., Pass, Cut, and Screen) to demonstrate critical action sequences. To explain the reasoning and logic behind players' actions, we leverage a large-language model (LLM) to generate narratives. We adopt a storytelling approach for complex scenarios from both first and third-person perspectives, integrating action visualizations. We evaluated Sportify with basketball fans to investigate its impact on understanding of tactics, and how different personal perspectives of narratives impact the understanding of complex tactic with action visualizations. Our evaluation with basketball fans demonstrates Sportify's capability to deepen tactical insights and amplify the viewing experience. Furthermore, third-person narration assists people in getting in-depth game explanations while first-person narration enhances fans' game engagement
Paper Structure (33 sections, 7 figures, 2 tables)

This paper contains 33 sections, 7 figures, 2 tables.

Figures (7)

  • Figure 1: The pipeline efficiently addresses both tactical and performance-based questions. It begins with data processing (A-1), where videos undergo a computer vision (CV) pipeline to identify players' coordinates, bounding boxes, and the ball's location. This information feeds into action detection (A-2) and tactic classification (A-3), generating tactical textual information for the narrative agent (B-1). Player coordinates and LLM responses are visually embedded (C) and displayed in the video (D). Performance-related queries are handled by the LLM, which retrieves data to provide text-based answers (D).
  • Figure 2: An action list displays the series of actions performed by offensive players, including Pass, Cut, Screen, and Shoot. The primary actions are identified based on ball movement or movements that enhance scoring opportunities, such as Shoot or Pass the ball. To extract the primary actions related to Pass and Shoot, we set criteria to filter out secondary actions like Cuts and Screens, identified by actions 1 and 2 in red circles.
  • Figure 3: The same tactics' explanation in various narrative perspectives.
  • Figure 4: The iteration design process to design action visualizations (i.e., Pass, Cut, and Screen). From P1 to P4, we remove the occlusion and highlight the two players who send and receive the ball. For the cut, we indicate the exact location that a player will move with flash-forward animation from C1 to C2, while the screen demonstrate a wall to be easily identified a player set on screen from S1 to S2.
  • Figure 5: The figure (A) shows the third-person perspective narrative like commentaries, whereas the figure (B) demonstrates the first-person perspective by integrating the action visualizations and narratives around the players to make people more engaged and immersive.
  • ...and 2 more figures