Off-the-shelf ChatGPT is a Good Few-shot Human Motion Predictor
Haoxuan Qu, Zhaoyang He, Zeyu Hu, Yujun Cai, Jun Liu
TL;DR
This work tackles few-shot human motion prediction without training a dedicated model by leveraging an off-the-shelf language model, ChatGPT. It introduces FMP-OC, which combines implicit knowledge extraction (sequence-description linkage and kinematic-chain-of-thought) with motion-in-context learning (pre-selection of base samples and practice-exam demonstrations) to make ChatGPT predict future poses from past observations. Empirical results on Human3.6M and CMU Mocap show state-of-the-art performance in a totally training-free setting, outperforming established few-shot methods across short- and long-term horizons. The approach demonstrates that large language models can be repurposed for non-language reasoning tasks with carefully designed prompts and context strategies, enabling practical, training-free motion prediction at scale.
Abstract
To facilitate the application of motion prediction in practice, recently, the few-shot motion prediction task has attracted increasing research attention. Yet, in existing few-shot motion prediction works, a specific model that is dedicatedly trained over human motions is generally required. In this work, rather than tackling this task through training a specific human motion prediction model, we instead propose a novel FMP-OC framework. In FMP-OC, in a totally training-free manner, we enable Few-shot Motion Prediction, which is a non-language task, to be performed directly via utilizing the Off-the-shelf language model ChatGPT. Specifically, to lead ChatGPT as a language model to become an accurate motion predictor, in FMP-OC, we first introduce several novel designs to facilitate extracting implicit knowledge from ChatGPT. Moreover, we also incorporate our framework with a motion-in-context learning mechanism. Extensive experiments demonstrate the efficacy of our proposed framework.
