Table of Contents
Fetching ...

Relation Learning and Aggregate-attention for Multi-person Motion Prediction

Kehua Qu, Rui Ding, Jin Tang

TL;DR

A distance-aware cross-attention that incorporates physical distance constraints into inter-relation learning through a learnable distance weighting coefficient is proposed and a novel plug-and-play aggregation module called the Interaction Aggregation Module (IAM), which employs an aggregate-attention mechanism to seamlessly integrate these relations.

Abstract

Multi-person motion prediction is an emerging and intricate task with broad real-world applications. Unlike single person motion prediction, it considers not just the skeleton structures or human trajectories but also the interactions between others. Previous methods use various networks to achieve impressive predictions but often overlook that the joints relations within an individual (intra-relation) and interactions among groups (inter-relation) are distinct types of representations. These methods often lack explicit representation of inter&intra-relations, and inevitably introduce undesired dependencies. To address this issue, we introduce a new collaborative framework for multi-person motion prediction that explicitly modeling these relations:a GCN-based network for intra-relations and a novel reasoning network for inter-relations.Moreover, we propose a novel plug-and-play aggregation module called the Interaction Aggregation Module (IAM), which employs an aggregate-attention mechanism to seamlessly integrate these relations. Experiments indicate that the module can also be applied to other dual-path models. Extensive experiments on the 3DPW, 3DPW-RC, CMU-Mocap, MuPoTS-3D, as well as synthesized datasets Mix1 & Mix2 (9 to 15 persons), demonstrate that our method achieves state-of-the-art performance.

Relation Learning and Aggregate-attention for Multi-person Motion Prediction

TL;DR

A distance-aware cross-attention that incorporates physical distance constraints into inter-relation learning through a learnable distance weighting coefficient is proposed and a novel plug-and-play aggregation module called the Interaction Aggregation Module (IAM), which employs an aggregate-attention mechanism to seamlessly integrate these relations.

Abstract

Multi-person motion prediction is an emerging and intricate task with broad real-world applications. Unlike single person motion prediction, it considers not just the skeleton structures or human trajectories but also the interactions between others. Previous methods use various networks to achieve impressive predictions but often overlook that the joints relations within an individual (intra-relation) and interactions among groups (inter-relation) are distinct types of representations. These methods often lack explicit representation of inter&intra-relations, and inevitably introduce undesired dependencies. To address this issue, we introduce a new collaborative framework for multi-person motion prediction that explicitly modeling these relations:a GCN-based network for intra-relations and a novel reasoning network for inter-relations.Moreover, we propose a novel plug-and-play aggregation module called the Interaction Aggregation Module (IAM), which employs an aggregate-attention mechanism to seamlessly integrate these relations. Experiments indicate that the module can also be applied to other dual-path models. Extensive experiments on the 3DPW, 3DPW-RC, CMU-Mocap, MuPoTS-3D, as well as synthesized datasets Mix1 & Mix2 (9 to 15 persons), demonstrate that our method achieves state-of-the-art performance.

Paper Structure

This paper contains 28 sections, 17 equations, 12 figures, 7 tables.

Figures (12)

  • Figure 1: Compared to previous methods wang2021multi9709907xu2023joint10194334 with relation learning, we propose a new collaborative learning framework that explicitly explore joints relations, including intra-relations and inter-relations. The red and blue dashed lines indicate the inter-relation and intra-relation, respectively.
  • Figure 2: Visualization of Pearson correlation coefficient (PCC) between different individuals. We conducted two Transformer-based architecture experiments on the CMU-Mocap dataset: (i) The explicit relation modeling adopts cross-attention to learn inter-relations between different individuals’ joints, and self-attention to learn intra-relation of each individual’s joints. (ii) The global modeling utilizes self-attention to learn all relations of all inputting skeleton joints. For each scene, the upper image shows the true scene in the sequence. The lower image shows the visualization of PCC between the 15 joints of person 1 and the other two persons’ joints. The red color indicates higher correlation (larger PCC) between two joints, while the blue indicates lower (smaller PCC).
  • Figure 3: The illustration of different fusion strategies: (a) TRiPOD 9709907 concats features from two different branch, then feeds them to an RNN decoder. (b) MRT wang2021multi feeds distinct features to a Transformer decoder to explore their dependency automatically. (c) Our method leverages inter&intra-relations by a designed fusion module (IAM).
  • Figure 4: The architecture of our framework. The method contains : i)Encoder, ii) Intra&inter-relation learning, iii) Relation aggregation, iv)Decoder.
  • Figure 5: The illustration of inter-relation learning. The figure above represents the calculation of inter-relation feature of the $n$-th person. $n,m$ denote two distinct persons, person $n$ and person $m$. $\bigotimes$ denotes multiplication.
  • ...and 7 more figures