Enhancing Human Motion Prediction via Multi-range Decoupling Decoding with Gating-adjusting Aggregation
Jiexin Wang, Wenwen Qiang, Zhao Yang, Bing Su
TL;DR
The paper addresses horizon-dependent temporal correlations in human motion prediction by introducing MD2GA, a two-stage framework that (i) decouples decoding across multiple future ranges with a Multi-range Decoupling Decoder (MDD) and (ii) fuses the horizon-specific predictions with a gating-adjusting aggregation (GA). The MDD uses $K$ decoders to produce outputs $Y_k$ at horizons $L_k$, while GA computes mixing weights $oldsymbol{ ext{w}}$ via a lightweight gating network and blends outputs with an attention mask $A_{k,t}$. The method is designed to be easily integrated with existing HMP models and is trained with a joint loss $oldsymbol{ ext{L}}=oldsymbol{ ext{L}}_1+oldsymbol{ ext{L}}_2$ that encourages horizon-specific decodings to align with a shared representation. Experiments on H3.6M, CMU-Mocap, and 3DPW show consistent MPJPE reductions across short- and long-term predictions, demonstrating improved motion representation learning and robustness across architectures. The approach offers practical benefits for real-world motion prediction systems due to its simplicity and wide compatibility.
Abstract
Expressive representation of pose sequences is crucial for accurate motion modeling in human motion prediction (HMP). While recent deep learning-based methods have shown promise in learning motion representations, these methods tend to overlook the varying relevance and dependencies between historical information and future moments, with a stronger correlation for short-term predictions and weaker for distant future predictions. This limits the learning of motion representation and then hampers prediction performance. In this paper, we propose a novel approach called multi-range decoupling decoding with gating-adjusting aggregation ($MD2GA$), which leverages the temporal correlations to refine motion representation learning. This approach employs a two-stage strategy for HMP. In the first stage, a multi-range decoupling decoding adeptly adjusts feature learning by decoding the shared features into distinct future lengths, where different decoders offer diverse insights into motion patterns. In the second stage, a gating-adjusting aggregation dynamically combines the diverse insights guided by input motion data. Extensive experiments demonstrate that the proposed method can be easily integrated into other motion prediction methods and enhance their prediction performance.
