Table of Contents
Fetching ...

Deep Understanding of Soccer Match Videos

Shikun Xu, Yandong Zhu, Gen Li, Changhu Wang

TL;DR

The paper tackles the problem of obtaining deep, frame-level understanding from live soccer match videos, where players and the ball are small and fast and camera views vary. It proposes an end-to-end vision system combining SSD-based small-object detection, DeepSORT-based multi-object tracking, jersey-number recognition with Spatial Transformer Networks and semi-supervised multitask learning, HSV-based segmentation with team clustering, ResNet-based highlight detection, and a two-stage key-point detection with homography to an overhead field map. Key contributions include robust small-object detection (mAP ≈ 78%), maintained identity continuity, accurate jersey-number recognition under pose variation, precise 17-point keypoint localization, and real-time generation of highlights and tactical visuals. The practical impact is enabling data-rich, real-time analysis and visualization for fans and analysts in live soccer contexts.

Abstract

Soccer is one of the most popular sport worldwide, with live broadcasts frequently available for major matches. However, extracting detailed, frame-by-frame information on player actions from these videos remains a challenge. Utilizing state-of-the-art computer vision technologies, our system can detect key objects such as soccer balls, players and referees. It also tracks the movements of players and the ball, recognizes player numbers, classifies scenes, and identifies highlights such as goal kicks. By analyzing live TV streams of soccer matches, our system can generate highlight GIFs, tactical illustrations, and diverse summary graphs of ongoing games. Through these visual recognition techniques, we deliver a comprehensive understanding of soccer game videos, enriching the viewer's experience with detailed and insightful analysis.

Deep Understanding of Soccer Match Videos

TL;DR

The paper tackles the problem of obtaining deep, frame-level understanding from live soccer match videos, where players and the ball are small and fast and camera views vary. It proposes an end-to-end vision system combining SSD-based small-object detection, DeepSORT-based multi-object tracking, jersey-number recognition with Spatial Transformer Networks and semi-supervised multitask learning, HSV-based segmentation with team clustering, ResNet-based highlight detection, and a two-stage key-point detection with homography to an overhead field map. Key contributions include robust small-object detection (mAP ≈ 78%), maintained identity continuity, accurate jersey-number recognition under pose variation, precise 17-point keypoint localization, and real-time generation of highlights and tactical visuals. The practical impact is enabling data-rich, real-time analysis and visualization for fans and analysts in live soccer contexts.

Abstract

Soccer is one of the most popular sport worldwide, with live broadcasts frequently available for major matches. However, extracting detailed, frame-by-frame information on player actions from these videos remains a challenge. Utilizing state-of-the-art computer vision technologies, our system can detect key objects such as soccer balls, players and referees. It also tracks the movements of players and the ball, recognizes player numbers, classifies scenes, and identifies highlights such as goal kicks. By analyzing live TV streams of soccer matches, our system can generate highlight GIFs, tactical illustrations, and diverse summary graphs of ongoing games. Through these visual recognition techniques, we deliver a comprehensive understanding of soccer game videos, enriching the viewer's experience with detailed and insightful analysis.
Paper Structure (9 sections, 4 figures)

This paper contains 9 sections, 4 figures.

Figures (4)

  • Figure 1: parsing results, including player detection and tracking, jersey number recognition, and team clustering.
  • Figure 2: Illustration of the detection and tracking results.The bounding boxes represent detection results. The color tail represents the last 10 frames' positions of the player, which is calculated by the algorithm.
  • Figure 3: Illustration of the segmentation, team cluster and view transform results. The translucent mask represents each player's segmentation results and the color of mask represents different teams, including judges' color. The court map on the left bottom corner represents camera view field after transformed from the original image.
  • Figure 4: Illustration of the game summary, including the controlling rates, ball position heatmap, and team controlling distributes over the field.