Table of Contents
Fetching ...

Runner re-identification from single-view running video in the open-world setting

Tomohiro Suzuki, Kazushi Tsutsui, Kazuya Takeda, Keisuke Fujii

TL;DR

This paper addresses open-world runner re-identification from single-view running videos without relying on labeled data for feature extraction. It presents a five-step pipeline—segmentation, runner classification, tracking, shoe detection, and unsupervised re-identification using GRU AutoEncoder dynamics plus HHCL global/local features with color histograms. Key contributions include integrating dynamic sequence features and local shoe cues to outperform several unsupervised baselines, and demonstrating robust automatic processing from raw video in daytime conditions. The approach enables automated analysis of running videos in daily practice, with practical implications for sports analytics and training assessment.

Abstract

In many sports, player re-identification is crucial for automatic video processing and analysis. However, most of the current studies on player re-identification in multi- or single-view sports videos focus on re-identification in the closed-world setting using labeled image dataset, and player re-identification in the open-world setting for automatic video analysis is not well developed. In this paper, we propose a runner re-identification system that directly processes single-view video to address the open-world setting. In the open-world setting, we cannot use labeled dataset and have to process video directly. The proposed system automatically processes raw video as input to identify runners, and it can identify runners even when they are framed out multiple times. For the automatic processing, we first detect the runners in the video using the pre-trained YOLOv8 and the fine-tuned EfficientNet. We then track the runners using ByteTrack and detect their shoes with the fine-tuned YOLOv8. Finally, we extract the image features of the runners using an unsupervised method with the gated recurrent unit autoencoder and global and local features mixing. To improve the accuracy of runner re-identification, we use shoe images as local image features and dynamic features of running sequence images. We evaluated the system on a running practice video dataset and showed that the proposed method identified runners with higher accuracy than some state-of-the-art models in unsupervised re-identification. We also showed that our proposed local image feature and running dynamic feature were effective for runner re-identification. Our runner re-identification system can be useful for the automatic analysis of running videos.

Runner re-identification from single-view running video in the open-world setting

TL;DR

This paper addresses open-world runner re-identification from single-view running videos without relying on labeled data for feature extraction. It presents a five-step pipeline—segmentation, runner classification, tracking, shoe detection, and unsupervised re-identification using GRU AutoEncoder dynamics plus HHCL global/local features with color histograms. Key contributions include integrating dynamic sequence features and local shoe cues to outperform several unsupervised baselines, and demonstrating robust automatic processing from raw video in daytime conditions. The approach enables automated analysis of running videos in daily practice, with practical implications for sports analytics and training assessment.

Abstract

In many sports, player re-identification is crucial for automatic video processing and analysis. However, most of the current studies on player re-identification in multi- or single-view sports videos focus on re-identification in the closed-world setting using labeled image dataset, and player re-identification in the open-world setting for automatic video analysis is not well developed. In this paper, we propose a runner re-identification system that directly processes single-view video to address the open-world setting. In the open-world setting, we cannot use labeled dataset and have to process video directly. The proposed system automatically processes raw video as input to identify runners, and it can identify runners even when they are framed out multiple times. For the automatic processing, we first detect the runners in the video using the pre-trained YOLOv8 and the fine-tuned EfficientNet. We then track the runners using ByteTrack and detect their shoes with the fine-tuned YOLOv8. Finally, we extract the image features of the runners using an unsupervised method with the gated recurrent unit autoencoder and global and local features mixing. To improve the accuracy of runner re-identification, we use shoe images as local image features and dynamic features of running sequence images. We evaluated the system on a running practice video dataset and showed that the proposed method identified runners with higher accuracy than some state-of-the-art models in unsupervised re-identification. We also showed that our proposed local image feature and running dynamic feature were effective for runner re-identification. Our runner re-identification system can be useful for the automatic analysis of running videos.
Paper Structure (19 sections, 2 equations, 7 figures, 2 tables)

This paper contains 19 sections, 2 equations, 7 figures, 2 tables.

Figures (7)

  • Figure 1: Overview of the two re-identification tasks. Re-identification in the closed-world setting can use the labeled image dataset, but it is not directly applicable to real-world general sports video analysis. On the other hand, re-identification in the open-world setting, our proposed method, can directly process raw video and does not require labeled data for feature extraction.
  • Figure 2: Our proposed runner re-identification system flow. In Section 3.1, we describe the segmentation and cropping model, and in Section 3.2, we describe the EfficientNet fine-tuned for runner classification. In Sections 3.3 and 3.4, we describe the tracking and the shoe detection separately. In Section 3.5, we describe the runner re-identification.
  • Figure 3: Example of tracking results. Representative images and a shoe image are obtained for each runner. Note that different IDs are assigned to the same person who reappears on the screen.
  • Figure 4: Model structure of the GRU AE. The encoder receives sequential images of running motion and outputs a 128-dimensional latent variable. The decoder receives latent variable and inverse-ordered sequential images and outputs the reconstructed image one frame before each input image.
  • Figure 5: Examples of images misclassified as runners. Although we accomplished a $99.8\%$F1-score, most of the misclassifications were images of players from other athletes running.
  • ...and 2 more figures