Language and Multimodal Models in Sports: A Survey of Datasets and Applications
Haotian Xia, Zhengbang Yang, Yun Zhao, Yuqing Wang, Jingxi Li, Rhys Tracy, Zhuangdi Zhu, Yuan-fang Wang, Hanjie Chen, Weining Shen
TL;DR
This survey addresses the need to synthesize NLP and multimodal resources in sports by organizing post-2020 datasets into language-based, multimodal, and convertible categories. It catalogs a broad set of datasets and associated tasks—from game prediction and NER to sports QA and video understanding—highlighting key benchmarks like SportsSum, SportQA, QASports, and VREN, and discussing implications for real-time processing and personalization. The work identifies core challenges (data quality, diversity, privacy, and multimodal integration) and outlines opportunities for enthusiasts, professionals, and medical/rehabilitation domains. By mapping datasets to concrete applications and future directions, the paper provides a foundational resource for researchers and practitioners aiming to advance NLP and multimodal models in sports.
Abstract
Recent integration of Natural Language Processing (NLP) and multimodal models has advanced the field of sports analytics. This survey presents a comprehensive review of the datasets and applications driving these innovations post-2020. We overviewed and categorized datasets into three primary types: language-based, multimodal, and convertible datasets. Language-based and multimodal datasets are for tasks involving text or multimodality (e.g., text, video, audio), respectively. Convertible datasets, initially single-modal (video), can be enriched with additional annotations, such as explanations of actions and video descriptions, to become multimodal, offering future potential for richer and more diverse applications. Our study highlights the contributions of these datasets to various applications, from improving fan experiences to supporting tactical analysis and medical diagnostics. We also discuss the challenges and future directions in dataset development, emphasizing the need for diverse, high-quality data to support real-time processing and personalized user experiences. This survey provides a foundational resource for researchers and practitioners aiming to leverage NLP and multimodal models in sports, offering insights into current trends and future opportunities in the field.
