Table of Contents
Fetching ...

Group Activity Recognition using Unreliable Tracked Pose

Haritha Thilakarathne, Aiden Nibali, Zhen He, Stuart Morgan

TL;DR

This work tackles the problem of robust group activity recognition when per-person tracking is unreliable. It introduces RePGARS, which renders individual poses into color-coded images and fuses them with RGB frames to form a 6-channel input for a 3D CNN classifier, enabling end-to-end learning that is tolerant to tracking errors. Through experiments on Volleyball and the Australian Netball Video Dataset, RePGARS demonstrates strong performance, outpacing non-ground-truth tracking baselines and showing only minimal degradation compared to ground-truth tracking, while significantly outperforming prior pose-based methods under realistic conditions. The proposed approach also introduces the Australian Netball Video Dataset as a valuable resource for evaluating sports-related group activity recognition, highlighting the practical impact for real-world sports analytics.

Abstract

Group activity recognition in video is a complex task due to the need for a model to recognise the actions of all individuals in the video and their complex interactions. Recent studies propose that optimal performance is achieved by individually tracking each person and subsequently inputting the sequence of poses or cropped images/optical flow into a model. This helps the model to recognise what actions each person is performing before they are merged to arrive at the group action class. However, all previous models are highly reliant on high quality tracking and have only been evaluated using ground truth tracking information. In practice it is almost impossible to achieve highly reliable tracking information for all individuals in a group activity video. We introduce an innovative deep learning-based group activity recognition approach called Rendered Pose based Group Activity Recognition System (RePGARS) which is designed to be tolerant of unreliable tracking and pose information. Experimental results confirm that RePGARS outperforms all existing group activity recognition algorithms tested which do not use ground truth detection and tracking information.

Group Activity Recognition using Unreliable Tracked Pose

TL;DR

This work tackles the problem of robust group activity recognition when per-person tracking is unreliable. It introduces RePGARS, which renders individual poses into color-coded images and fuses them with RGB frames to form a 6-channel input for a 3D CNN classifier, enabling end-to-end learning that is tolerant to tracking errors. Through experiments on Volleyball and the Australian Netball Video Dataset, RePGARS demonstrates strong performance, outpacing non-ground-truth tracking baselines and showing only minimal degradation compared to ground-truth tracking, while significantly outperforming prior pose-based methods under realistic conditions. The proposed approach also introduces the Australian Netball Video Dataset as a valuable resource for evaluating sports-related group activity recognition, highlighting the practical impact for real-world sports analytics.

Abstract

Group activity recognition in video is a complex task due to the need for a model to recognise the actions of all individuals in the video and their complex interactions. Recent studies propose that optimal performance is achieved by individually tracking each person and subsequently inputting the sequence of poses or cropped images/optical flow into a model. This helps the model to recognise what actions each person is performing before they are merged to arrive at the group action class. However, all previous models are highly reliant on high quality tracking and have only been evaluated using ground truth tracking information. In practice it is almost impossible to achieve highly reliable tracking information for all individuals in a group activity video. We introduce an innovative deep learning-based group activity recognition approach called Rendered Pose based Group Activity Recognition System (RePGARS) which is designed to be tolerant of unreliable tracking and pose information. Experimental results confirm that RePGARS outperforms all existing group activity recognition algorithms tested which do not use ground truth detection and tracking information.
Paper Structure (17 sections, 9 figures, 3 tables)

This paper contains 17 sections, 9 figures, 3 tables.

Figures (9)

  • Figure 1: Sample video frames from varied venues in the Australian Netball Video dataset.
  • Figure 2: Distribution of event classes annotated in the Australian Netball video dataset.
  • Figure 3: Examples of unreliable tracking scenarios.
  • Figure 4: Late person fusion approach. Similar colour individuals in each frame belongs to a single tracking identifier.
  • Figure 5: Early person fusion approach. Similar colour individuals in each frame belongs to a single tracking identifier.
  • ...and 4 more figures