Table of Contents
Fetching ...

AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity Recognition

Meiqi Cao, Rui Yan, Xiangbo Shu, Guangzhao Dai, Yazhou Yao, Guo-Sen Xie

TL;DR

Panoramic Activity Recognition (PAR) demands recognizing individual, group, and global activities in crowded panoramic scenes with size-varying, occluded persons. The authors propose AdaFPP, an all-in-one framework that jointly optimizes an adapt-focused Panoramic Adapt-Focuser (PAF) detector and a Bi-Propagating Prototyper (BPP) for multi-granularity prototypes, enabling end-to-end learning. PAF performs coarse-to-fine size-adapting detections in crowded scenes, while BPP employs Unified Bidirectional Encoding blocks to propagate information across individual, group, and global levels in both forward and backward directions. Experiments on the JRDB-PAR dataset show AdaFPP achieving strong PAR performance, with ablations confirming the complementary gains from PAF and BPP and qualitative analyses illustrating improved detection of small, occluded persons. The work advances practical PAR by reducing dependence on ground-truth boxes and mitigating information loss through closed-loop multi-granularity interactions, with potential impact on real-world surveillance and analysis systems.

Abstract

Panoramic Activity Recognition (PAR) aims to identify multi-granularity behaviors performed by multiple persons in panoramic scenes, including individual activities, group activities, and global activities. Previous methods 1) heavily rely on manually annotated detection boxes in training and inference, hindering further practical deployment; or 2) directly employ normal detectors to detect multiple persons with varying size and spatial occlusion in panoramic scenes, blocking the performance gain of PAR. To this end, we consider learning a detector adapting varying-size occluded persons, which is optimized along with the recognition module in the all-in-one framework. Therefore, we propose a novel Adapt-Focused bi-Propagating Prototype learning (AdaFPP) framework to jointly recognize individual, group, and global activities in panoramic activity scenes by learning an adapt-focused detector and multi-granularity prototypes as the pretext tasks in an end-to-end way. Specifically, to accommodate the varying sizes and spatial occlusion of multiple persons in crowed panoramic scenes, we introduce a panoramic adapt-focuser, achieving the size-adapting detection of individuals by comprehensively selecting and performing fine-grained detections on object-dense sub-regions identified through original detections. In addition, to mitigate information loss due to inaccurate individual localizations, we introduce a bi-propagation prototyper that promotes closed-loop interaction and informative consistency across different granularities by facilitating bidirectional information propagation among the individual, group, and global levels. Extensive experiments demonstrate the significant performance of AdaFPP and emphasize its powerful applicability for PAR.

AdaFPP: Adapt-Focused Bi-Propagating Prototype Learning for Panoramic Activity Recognition

TL;DR

Panoramic Activity Recognition (PAR) demands recognizing individual, group, and global activities in crowded panoramic scenes with size-varying, occluded persons. The authors propose AdaFPP, an all-in-one framework that jointly optimizes an adapt-focused Panoramic Adapt-Focuser (PAF) detector and a Bi-Propagating Prototyper (BPP) for multi-granularity prototypes, enabling end-to-end learning. PAF performs coarse-to-fine size-adapting detections in crowded scenes, while BPP employs Unified Bidirectional Encoding blocks to propagate information across individual, group, and global levels in both forward and backward directions. Experiments on the JRDB-PAR dataset show AdaFPP achieving strong PAR performance, with ablations confirming the complementary gains from PAF and BPP and qualitative analyses illustrating improved detection of small, occluded persons. The work advances practical PAR by reducing dependence on ground-truth boxes and mitigating information loss through closed-loop multi-granularity interactions, with potential impact on real-world surveillance and analysis systems.

Abstract

Panoramic Activity Recognition (PAR) aims to identify multi-granularity behaviors performed by multiple persons in panoramic scenes, including individual activities, group activities, and global activities. Previous methods 1) heavily rely on manually annotated detection boxes in training and inference, hindering further practical deployment; or 2) directly employ normal detectors to detect multiple persons with varying size and spatial occlusion in panoramic scenes, blocking the performance gain of PAR. To this end, we consider learning a detector adapting varying-size occluded persons, which is optimized along with the recognition module in the all-in-one framework. Therefore, we propose a novel Adapt-Focused bi-Propagating Prototype learning (AdaFPP) framework to jointly recognize individual, group, and global activities in panoramic activity scenes by learning an adapt-focused detector and multi-granularity prototypes as the pretext tasks in an end-to-end way. Specifically, to accommodate the varying sizes and spatial occlusion of multiple persons in crowed panoramic scenes, we introduce a panoramic adapt-focuser, achieving the size-adapting detection of individuals by comprehensively selecting and performing fine-grained detections on object-dense sub-regions identified through original detections. In addition, to mitigate information loss due to inaccurate individual localizations, we introduce a bi-propagation prototyper that promotes closed-loop interaction and informative consistency across different granularities by facilitating bidirectional information propagation among the individual, group, and global levels. Extensive experiments demonstrate the significant performance of AdaFPP and emphasize its powerful applicability for PAR.
Paper Structure (20 sections, 19 equations, 5 figures, 2 tables)

This paper contains 20 sections, 19 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: Our solution and insights. Solution: all-in-one detection and recognition customized for panoramic activities performed by size-varying persons. Insights: i) Adaptive detection instead of ground truth (expensive) and normal detector (for size-similar persons) for size-varying occluded persons; and ii) Bi-propagating interaction instead of single-propagating interaction as the closed-loop interaction for mitigating information loss due to inaccurate localizations.
  • Figure 2: Framework of the proposed AdaFPP. It consists of two crucial components, i.e., Panoramic Adapt-Focuser (PAF) and Bi-Propagating Prototyper (BPP). PAF comprehensively localizes individuals in crowded panoramic scenes by adaptively selecting and performing fine-grained detections from original detections. BPP learns the multiple-granularity prototypes by prompting the close-loop interaction in a bi-propagatively way. Finally, the detection and recognition heads are jointly used for optimizing the whole model in an end-to-end way.
  • Figure 3: Detailed architecture of one Unified Bidirectional Encoding (UBE) block. It includes the UME (bottom-up encoding) and CME (top-down encoding) modules. $l$ is defined as the $l\in\{patch, ind, group\}$, and $l+1$ is the higher-level granularity of $l$.
  • Figure 4: Visualization of individual detections by PAF. The gray boxes indicate the original detections, and the red boxes indicate the additional fine-grained detections by PAF.
  • Figure 5: Comparison visualization at three-granularity activity recognition. Incorrect recognition results are marked in red. (a) has only the bottom-up propagation. (b) has the bottom-up propagation and top-down propagation.