Table of Contents
Fetching ...

Occlusion-Aware 3D Motion Interpretation for Abnormal Behavior Detection

Su Li, Wang Liang, Jianye Wang, Ziheng Zhang, Lei Zhang

TL;DR

OAD2D, which discriminates against motion abnormalities based on reconstructing 3D coordinates of mesh vertices and human joints from monocular videos, demonstrates the robustness of abnormal behavior detection against severe and self-occlusions, as it reconstructs human motion trajectories in global coordinates to effectively mitigate occlusion issues.

Abstract

Estimating abnormal posture based on 3D pose is vital in human pose analysis, yet it presents challenges, especially when reconstructing 3D human poses from monocular datasets with occlusions. Accurate reconstructions enable the restoration of 3D movements, which assist in the extraction of semantic details necessary for analyzing abnormal behaviors. However, most existing methods depend on predefined key points as a basis for estimating the coordinates of occluded joints, where variations in data quality have adversely affected the performance of these models. In this paper, we present OAD2D, which discriminates against motion abnormalities based on reconstructing 3D coordinates of mesh vertices and human joints from monocular videos. The OAD2D employs optical flow to capture motion prior information in video streams, enriching the information on occluded human movements and ensuring temporal-spatial alignment of poses. Moreover, we reformulate the abnormal posture estimation by coupling it with Motion to Text (M2T) model in which, the VQVAE is employed to quantize motion features. This approach maps motion tokens to text tokens, allowing for a semantically interpretable analysis of motion, and enhancing the generalization of abnormal posture detection boosted by Language model. Our approach demonstrates the robustness of abnormal behavior detection against severe and self-occlusions, as it reconstructs human motion trajectories in global coordinates to effectively mitigate occlusion issues. Our method, validated using the Human3.6M, 3DPW, and NTU RGB+D datasets, achieves a high $F_1-$Score of 0.94 on the NTU RGB+D dataset for medical condition detection. And we will release all of our code and data.

Occlusion-Aware 3D Motion Interpretation for Abnormal Behavior Detection

TL;DR

OAD2D, which discriminates against motion abnormalities based on reconstructing 3D coordinates of mesh vertices and human joints from monocular videos, demonstrates the robustness of abnormal behavior detection against severe and self-occlusions, as it reconstructs human motion trajectories in global coordinates to effectively mitigate occlusion issues.

Abstract

Estimating abnormal posture based on 3D pose is vital in human pose analysis, yet it presents challenges, especially when reconstructing 3D human poses from monocular datasets with occlusions. Accurate reconstructions enable the restoration of 3D movements, which assist in the extraction of semantic details necessary for analyzing abnormal behaviors. However, most existing methods depend on predefined key points as a basis for estimating the coordinates of occluded joints, where variations in data quality have adversely affected the performance of these models. In this paper, we present OAD2D, which discriminates against motion abnormalities based on reconstructing 3D coordinates of mesh vertices and human joints from monocular videos. The OAD2D employs optical flow to capture motion prior information in video streams, enriching the information on occluded human movements and ensuring temporal-spatial alignment of poses. Moreover, we reformulate the abnormal posture estimation by coupling it with Motion to Text (M2T) model in which, the VQVAE is employed to quantize motion features. This approach maps motion tokens to text tokens, allowing for a semantically interpretable analysis of motion, and enhancing the generalization of abnormal posture detection boosted by Language model. Our approach demonstrates the robustness of abnormal behavior detection against severe and self-occlusions, as it reconstructs human motion trajectories in global coordinates to effectively mitigate occlusion issues. Our method, validated using the Human3.6M, 3DPW, and NTU RGB+D datasets, achieves a high Score of 0.94 on the NTU RGB+D dataset for medical condition detection. And we will release all of our code and data.
Paper Structure (11 sections, 9 equations, 3 figures, 5 tables)

This paper contains 11 sections, 9 equations, 3 figures, 5 tables.

Figures (3)

  • Figure 1: The M2T model captures motion semantics, and LLM can use specifical prompts identify behavior classes in medical conditions.
  • Figure 2: Overview of our method (OAD2D). The entire pipeline consists of pose estimation using image and optical flow, followed by trajectory optimization and motion quantization, along with LM classification. (a) A two-stream neural network generates 3D heatmaps to facilitate regression to the 3D joints, and, when integrated with kinematics, it outputs shape parameters $\beta$, twist angles $\varphi$, and pose $\theta$ for motion reconstruction ($\Theta$). (b) The global trajectory predictor creates corresponding global trajectories $(T, R)$, which include root translation $T$ and rotation $R$. This motion data is then quantized by a VQVAE to construct the codebook $E$, motion tokens, and subsequently to generate semantic motion representations for abnormal behavior detection through sentiment analysis via LLM.
  • Figure 3: Qualitative Results on NTU-MC.