Table of Contents
Fetching ...

CHASE: Learning Convex Hull Adaptive Shift for Skeleton-based Multi-Entity Action Recognition

Yuhang Wen, Mengyuan Liu, Songtao Wu, Beichen Ding

TL;DR

This work tackles the challenge of inter-entity distribution discrepancies in skeleton-based multi-entity action recognition by introducing CHASE, a backbone-wrapper that performs sample-adaptive coordinate shifts. CHASE combines the Implicit Convex Hull Constrained Adaptive Shift (ICHAS) with a lightweight Coefficient Learning Block (CLB) and an auxiliary Mini-batch Pair-wise Maximum Mean Discrepancy (MPMMD) objective to minimize cross-entity distribution gaps. The approach unbiases downstream backbones, yielding consistent performance gains across six benchmarks and multiple baselines, while adding only a small parameter overhead. By enabling a principled, convex-hull-constrained origin shift and discrepancy minimization, CHASE provides a practical and generalizable tool for improving multi-entity action recognition in skeletal data. Code is publicly available to facilitate reproducibility and adoption across related models and datasets.

Abstract

Skeleton-based multi-entity action recognition is a challenging task aiming to identify interactive actions or group activities involving multiple diverse entities. Existing models for individuals often fall short in this task due to the inherent distribution discrepancies among entity skeletons, leading to suboptimal backbone optimization. To this end, we introduce a Convex Hull Adaptive Shift based multi-Entity action recognition method (CHASE), which mitigates inter-entity distribution gaps and unbiases subsequent backbones. Specifically, CHASE comprises a learnable parameterized network and an auxiliary objective. The parameterized network achieves plausible, sample-adaptive repositioning of skeleton sequences through two key components. First, the Implicit Convex Hull Constrained Adaptive Shift ensures that the new origin of the coordinate system is within the skeleton convex hull. Second, the Coefficient Learning Block provides a lightweight parameterization of the mapping from skeleton sequences to their specific coefficients in convex combinations. Moreover, to guide the optimization of this network for discrepancy minimization, we propose the Mini-batch Pair-wise Maximum Mean Discrepancy as the additional objective. CHASE operates as a sample-adaptive normalization method to mitigate inter-entity distribution discrepancies, thereby reducing data bias and improving the subsequent classifier's multi-entity action recognition performance. Extensive experiments on six datasets, including NTU Mutual 11/26, H2O, Assembly101, Collective Activity and Volleyball, consistently verify our approach by seamlessly adapting to single-entity backbones and boosting their performance in multi-entity scenarios. Our code is publicly available at https://github.com/Necolizer/CHASE .

CHASE: Learning Convex Hull Adaptive Shift for Skeleton-based Multi-Entity Action Recognition

TL;DR

This work tackles the challenge of inter-entity distribution discrepancies in skeleton-based multi-entity action recognition by introducing CHASE, a backbone-wrapper that performs sample-adaptive coordinate shifts. CHASE combines the Implicit Convex Hull Constrained Adaptive Shift (ICHAS) with a lightweight Coefficient Learning Block (CLB) and an auxiliary Mini-batch Pair-wise Maximum Mean Discrepancy (MPMMD) objective to minimize cross-entity distribution gaps. The approach unbiases downstream backbones, yielding consistent performance gains across six benchmarks and multiple baselines, while adding only a small parameter overhead. By enabling a principled, convex-hull-constrained origin shift and discrepancy minimization, CHASE provides a practical and generalizable tool for improving multi-entity action recognition in skeletal data. Code is publicly available to facilitate reproducibility and adoption across related models and datasets.

Abstract

Skeleton-based multi-entity action recognition is a challenging task aiming to identify interactive actions or group activities involving multiple diverse entities. Existing models for individuals often fall short in this task due to the inherent distribution discrepancies among entity skeletons, leading to suboptimal backbone optimization. To this end, we introduce a Convex Hull Adaptive Shift based multi-Entity action recognition method (CHASE), which mitigates inter-entity distribution gaps and unbiases subsequent backbones. Specifically, CHASE comprises a learnable parameterized network and an auxiliary objective. The parameterized network achieves plausible, sample-adaptive repositioning of skeleton sequences through two key components. First, the Implicit Convex Hull Constrained Adaptive Shift ensures that the new origin of the coordinate system is within the skeleton convex hull. Second, the Coefficient Learning Block provides a lightweight parameterization of the mapping from skeleton sequences to their specific coefficients in convex combinations. Moreover, to guide the optimization of this network for discrepancy minimization, we propose the Mini-batch Pair-wise Maximum Mean Discrepancy as the additional objective. CHASE operates as a sample-adaptive normalization method to mitigate inter-entity distribution discrepancies, thereby reducing data bias and improving the subsequent classifier's multi-entity action recognition performance. Extensive experiments on six datasets, including NTU Mutual 11/26, H2O, Assembly101, Collective Activity and Volleyball, consistently verify our approach by seamlessly adapting to single-entity backbones and boosting their performance in multi-entity scenarios. Our code is publicly available at https://github.com/Necolizer/CHASE .

Paper Structure

This paper contains 26 sections, 1 theorem, 19 equations, 10 figures, 10 tables, 1 algorithm.

Key Result

Proposition 1

The implicit skeleton convex hull constrained adaptive shift vector is formulated as where $X\in \mathbb{R}^{C\times U}$, $W\in \mathbb{R}^{U\times 1}$, and $\Vec{p^*}\in \mathbb{R}^{C\times 1}$. $\Vec{p^*}$ in Eq. eq:xsoftmaxw is an element in the set of all convex combinations of points in $X$. It is also a point that lies in the minimal convex set containing $X$.

Figures (10)

  • Figure 1: Inter-entity distribution discrepancies in multi-entity action recognition task. (a) We delineate three distinct settings: Vanilla (a common practice), S2CoM (an intuitive baseline approach), and CHASE (our proposed method). Column 2 illustrates spatiotemporal point clouds defined by the skeletons over $10^4$ sequences. Column 3-5 depict the projections of estimated distributions of these point clouds onto the x-y, z-x, and y-z planes. These projections reveal significant inter-entity distribution discrepancies when using Vanilla. (b) The discrepancies observed in Vanilla introduce bias into backbone models, leading to unsatisfactory performance. Although S2CoM can reduce these discrepancies, it makes the classifiers produce wrong predictions due to a complete loss of inter-entity information. With the lowest inter-entity discrepancy, our method unbiases the subsequent backbone to get the highest accuracy, underscoring its efficacy.
  • Figure 2: The overall framework of the proposed CHASE for multi-entity action recognition. Given a skeleton sequence of multi-entity action as input, CHASE executes an implicit convex hull constrained adaptive shift with the Coefficient Learning Block, implemented as a lightweight backbone wrapper. CHASE also collects pair-wise shifted skeletons within mini-batches, effectively alleviating inter-entity distribution discrepancies by introducing an additional objective.
  • Figure 3: Visualizations of multi-entity action samples and their skeleton convex hulls.
  • Figure 4: Qualitative results of CHASE. Different entity distributions are denoted by blue and orange. CHASE effectively mitigates inter-entity distribution discrepancies, demonstrating its clear effectiveness across a range of data scales, from small to large.
  • Figure 5: UMAP mcinnes2018umap-software visualizations of multi-entity skeleton sequence representations on the test split of NTU Mutual 26 X-Sub. Compared with Vanilla, our proposed CHASE differentiate similar multi-entity actions better by assisting backbones to learn more distinctive representations.
  • ...and 5 more figures

Theorems & Definitions (7)

  • Definition 1: Convex Hull Rockafellar1996
  • Proposition 1
  • proof
  • Definition 2: Skeleton Sequence of A Multi-Entity Action
  • Definition 3: Skeleton-based Multi-Entity Action Recognition
  • Definition 4: Joints & Bones
  • Definition 5: k-hop Bones InfoGCN2022