Table of Contents
Fetching ...

Multi-task Learning for Joint Re-identification, Team Affiliation, and Role Classification for Sports Visual Tracking

Amir M. Mansourian, Vladimir Somers, Christophe De Vleeschouwer, Shohreh Kasaei

TL;DR

This work tackles the integrated problem of tracking, re-identification, and semantic labeling (role and team) in sports videos by introducing PRTreID, a multi-task, part-based representation learned on a single backbone. By adding role classification and team affiliation heads to a strong part-based ReID baseline, the model yields richer embeddings that improve both identification and clustering across teams, including unseen ones. The authors couple PRTreID with a StrongSORT-inspired tracker (PRT-Track), replacing global appearance features with part-based embeddings and adding online EMA updates and offline tracklet merging to achieve state-of-the-art results on SoccerNet Tracking. They release their dataset and code to promote joint representation learning for sports analytics, and discuss limitations and future directions such as Jersey Number Recognition for further robustness in visually similar kits.

Abstract

Effective tracking and re-identification of players is essential for analyzing soccer videos. But, it is a challenging task due to the non-linear motion of players, the similarity in appearance of players from the same team, and frequent occlusions. Therefore, the ability to extract meaningful embeddings to represent players is crucial in developing an effective tracking and re-identification system. In this paper, a multi-purpose part-based person representation method, called PRTreID, is proposed that performs three tasks of role classification, team affiliation, and re-identification, simultaneously. In contrast to available literature, a single network is trained with multi-task supervision to solve all three tasks, jointly. The proposed joint method is computationally efficient due to the shared backbone. Also, the multi-task learning leads to richer and more discriminative representations, as demonstrated by both quantitative and qualitative results. To demonstrate the effectiveness of PRTreID, it is integrated with a state-of-the-art tracking method, using a part-based post-processing module to handle long-term tracking. The proposed tracking method outperforms all existing tracking methods on the challenging SoccerNet tracking dataset.

Multi-task Learning for Joint Re-identification, Team Affiliation, and Role Classification for Sports Visual Tracking

TL;DR

This work tackles the integrated problem of tracking, re-identification, and semantic labeling (role and team) in sports videos by introducing PRTreID, a multi-task, part-based representation learned on a single backbone. By adding role classification and team affiliation heads to a strong part-based ReID baseline, the model yields richer embeddings that improve both identification and clustering across teams, including unseen ones. The authors couple PRTreID with a StrongSORT-inspired tracker (PRT-Track), replacing global appearance features with part-based embeddings and adding online EMA updates and offline tracklet merging to achieve state-of-the-art results on SoccerNet Tracking. They release their dataset and code to promote joint representation learning for sports analytics, and discuss limitations and future directions such as Jersey Number Recognition for further robustness in visually similar kits.

Abstract

Effective tracking and re-identification of players is essential for analyzing soccer videos. But, it is a challenging task due to the non-linear motion of players, the similarity in appearance of players from the same team, and frequent occlusions. Therefore, the ability to extract meaningful embeddings to represent players is crucial in developing an effective tracking and re-identification system. In this paper, a multi-purpose part-based person representation method, called PRTreID, is proposed that performs three tasks of role classification, team affiliation, and re-identification, simultaneously. In contrast to available literature, a single network is trained with multi-task supervision to solve all three tasks, jointly. The proposed joint method is computationally efficient due to the shared backbone. Also, the multi-task learning leads to richer and more discriminative representations, as demonstrated by both quantitative and qualitative results. To demonstrate the effectiveness of PRTreID, it is integrated with a state-of-the-art tracking method, using a part-based post-processing module to handle long-term tracking. The proposed tracking method outperforms all existing tracking methods on the challenging SoccerNet tracking dataset.
Paper Structure (27 sections, 8 equations, 5 figures, 6 tables)

This paper contains 27 sections, 8 equations, 5 figures, 6 tables.

Figures (5)

  • Figure 1: Automated analytics for team sports requires tracking, re-identification, role classification (e.g. player, referee, staff, ...), and team affiliations (such as Team A or Team B) of all detected persons throughout an entire video of a game.
  • Figure 2: Diagram of the proposed PRTreID method. An input image is fed into a shared backbone, which outputs an embedding for each part of the body. The foreground mask is created by combining all parts embeddings. The re-identification objective is trained with triplet loss and cross entropy loss on the body parts embeddings. Additionally, team affiliation and role classification objectives are trained with triplet loss and focal loss, respectively, on the foreground embedding. At inference time, the resulting multi-purpose embeddings can be utilized for person re-identification, team affiliation, and role classification tasks.
  • Figure 3: t_SNE visualization of player embeddings in a 2D space for a specific video, with and without multi-task training of the model. It can be observed that proposed multi-task model improves the team players clustering.
  • Figure 4: Visualizations of two images from proposed ReID dataset and their attention maps of the foreground and each body part.
  • Figure 5: Visualization of three images in the query set and their top-5 retrieved images from the gallery set of proposed ReID dataset, along with foreground masks. The blue color represents the query sample, while the red color indicates a wrongly retrieved image and the green color indicates a correctly retrieved image from the gallery.