Table of Contents
Fetching ...

SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding

Sushant Gautam, Cise Midoglu, Vajira Thambawita, Michael A. Riegler, Pål Halvorsen, Mubarak Shah

TL;DR

SoccerChat addresses the need for holistic soccer game understanding by integrating visual, textual, and audio cues into a multimodal conversational AI. It enriches the SoccerNet dataset with jersey-color annotations and ASR transcripts and trains a Qwen2-VL-based model on a large video-instruction dataset to improve event comprehension and referee decision tasks. Experimental results show that joint training on SoccerChat and XFoul data yields superior performance on referee-related tasks and action classification, while sequential fine-tuning can reduce generalization. The work provides dataset, model weights, evaluation code, and prompts to support reproducibility and highlights the potential of multimodal AI for real-time, explainable sports analytics.

Abstract

The integration of artificial intelligence in sports analytics has transformed soccer video understanding, enabling real-time, automated insights into complex game dynamics. Traditional approaches rely on isolated data streams, limiting their effectiveness in capturing the full context of a match. To address this, we introduce SoccerChat, a multimodal conversational AI framework that integrates visual and textual data for enhanced soccer video comprehension. Leveraging the extensive SoccerNet dataset, enriched with jersey color annotations and automatic speech recognition (ASR) transcripts, SoccerChat is fine-tuned on a structured video instruction dataset to facilitate accurate game understanding, event classification, and referee decision making. We benchmark SoccerChat on action classification and referee decision-making tasks, demonstrating its performance in general soccer event comprehension while maintaining competitive accuracy in referee decision making. Our findings highlight the importance of multimodal integration in advancing soccer analytics, paving the way for more interactive and explainable AI-driven sports analysis. https://github.com/simula/SoccerChat

SoccerChat: Integrating Multimodal Data for Enhanced Soccer Game Understanding

TL;DR

SoccerChat addresses the need for holistic soccer game understanding by integrating visual, textual, and audio cues into a multimodal conversational AI. It enriches the SoccerNet dataset with jersey-color annotations and ASR transcripts and trains a Qwen2-VL-based model on a large video-instruction dataset to improve event comprehension and referee decision tasks. Experimental results show that joint training on SoccerChat and XFoul data yields superior performance on referee-related tasks and action classification, while sequential fine-tuning can reduce generalization. The work provides dataset, model weights, evaluation code, and prompts to support reproducibility and highlights the potential of multimodal AI for real-time, explainable sports analytics.

Abstract

The integration of artificial intelligence in sports analytics has transformed soccer video understanding, enabling real-time, automated insights into complex game dynamics. Traditional approaches rely on isolated data streams, limiting their effectiveness in capturing the full context of a match. To address this, we introduce SoccerChat, a multimodal conversational AI framework that integrates visual and textual data for enhanced soccer video comprehension. Leveraging the extensive SoccerNet dataset, enriched with jersey color annotations and automatic speech recognition (ASR) transcripts, SoccerChat is fine-tuned on a structured video instruction dataset to facilitate accurate game understanding, event classification, and referee decision making. We benchmark SoccerChat on action classification and referee decision-making tasks, demonstrating its performance in general soccer event comprehension while maintaining competitive accuracy in referee decision making. Our findings highlight the importance of multimodal integration in advancing soccer analytics, paving the way for more interactive and explainable AI-driven sports analysis. https://github.com/simula/SoccerChat

Paper Structure

This paper contains 36 sections, 7 figures, 4 tables.

Figures (7)

  • Figure 1: Pipeline that processes different dataset modalities to generate question-answer (QA) pairs for the SoccerChat instruction dataset.
  • Figure 2: SoccerChat Model based on Qwen2-VL qwen2-vl, illustrating the integration of visual and textual data for enhanced soccer video comprehension.
  • Figure 3: Score distribution of models for referee decision tasks shown using violin plots with quartiles and median indicators
  • Figure 4: Score distribution of models for six-class (left) and sixteen-class (right) classification tasks.
  • Figure 5: Confusion matrix for the six-class classification task.
  • ...and 2 more figures