Table of Contents
Fetching ...

Off-the-shelf ChatGPT is a Good Few-shot Human Motion Predictor

Haoxuan Qu, Zhaoyang He, Zeyu Hu, Yujun Cai, Jun Liu

TL;DR

This work tackles few-shot human motion prediction without training a dedicated model by leveraging an off-the-shelf language model, ChatGPT. It introduces FMP-OC, which combines implicit knowledge extraction (sequence-description linkage and kinematic-chain-of-thought) with motion-in-context learning (pre-selection of base samples and practice-exam demonstrations) to make ChatGPT predict future poses from past observations. Empirical results on Human3.6M and CMU Mocap show state-of-the-art performance in a totally training-free setting, outperforming established few-shot methods across short- and long-term horizons. The approach demonstrates that large language models can be repurposed for non-language reasoning tasks with carefully designed prompts and context strategies, enabling practical, training-free motion prediction at scale.

Abstract

To facilitate the application of motion prediction in practice, recently, the few-shot motion prediction task has attracted increasing research attention. Yet, in existing few-shot motion prediction works, a specific model that is dedicatedly trained over human motions is generally required. In this work, rather than tackling this task through training a specific human motion prediction model, we instead propose a novel FMP-OC framework. In FMP-OC, in a totally training-free manner, we enable Few-shot Motion Prediction, which is a non-language task, to be performed directly via utilizing the Off-the-shelf language model ChatGPT. Specifically, to lead ChatGPT as a language model to become an accurate motion predictor, in FMP-OC, we first introduce several novel designs to facilitate extracting implicit knowledge from ChatGPT. Moreover, we also incorporate our framework with a motion-in-context learning mechanism. Extensive experiments demonstrate the efficacy of our proposed framework.

Off-the-shelf ChatGPT is a Good Few-shot Human Motion Predictor

TL;DR

This work tackles few-shot human motion prediction without training a dedicated model by leveraging an off-the-shelf language model, ChatGPT. It introduces FMP-OC, which combines implicit knowledge extraction (sequence-description linkage and kinematic-chain-of-thought) with motion-in-context learning (pre-selection of base samples and practice-exam demonstrations) to make ChatGPT predict future poses from past observations. Empirical results on Human3.6M and CMU Mocap show state-of-the-art performance in a totally training-free setting, outperforming established few-shot methods across short- and long-term horizons. The approach demonstrates that large language models can be repurposed for non-language reasoning tasks with carefully designed prompts and context strategies, enabling practical, training-free motion prediction at scale.

Abstract

To facilitate the application of motion prediction in practice, recently, the few-shot motion prediction task has attracted increasing research attention. Yet, in existing few-shot motion prediction works, a specific model that is dedicatedly trained over human motions is generally required. In this work, rather than tackling this task through training a specific human motion prediction model, we instead propose a novel FMP-OC framework. In FMP-OC, in a totally training-free manner, we enable Few-shot Motion Prediction, which is a non-language task, to be performed directly via utilizing the Off-the-shelf language model ChatGPT. Specifically, to lead ChatGPT as a language model to become an accurate motion predictor, in FMP-OC, we first introduce several novel designs to facilitate extracting implicit knowledge from ChatGPT. Moreover, we also incorporate our framework with a motion-in-context learning mechanism. Extensive experiments demonstrate the efficacy of our proposed framework.
Paper Structure (14 sections, 1 theorem, 1 equation, 4 figures, 3 tables, 2 algorithms)

This paper contains 14 sections, 1 theorem, 1 equation, 4 figures, 3 tables, 2 algorithms.

Key Result

Theorem 1

Define the optimal subset $S^*_{sub}$ the subset of the base training set $S$ that has the highest representativeness over all subsets of $S$ that contains $P$ motion samples. Denote $r^*$ the representativeness of $S^*_{sub}$. The relationship between the representativeness $r$ of the subset $S_{su

Figures (4)

  • Figure 1: Illustration of our FMP-OC framework. As shown, we first incorporate our framework with two designs (in yellow and pink respectively) to extract (mine) implicit knowledge from ChatGPT effectively. Moreover, we also incorporate our framework with a motion-in-context learning mechanism (in green), which can further increase ChatGPT's familiarity with the human motion prediction task.
  • Figure 2: Illustration of the language command for ChatGPT when human poses in the motion sequence (in each frame) are guided to be linked to their corresponding motion descriptions. In the demonstrated example, the length of the past observed motion sequence $L$ is set to 5, and the length of the future motion sequence $J$ is set to 5.
  • Figure 3: Qualitative results of our framework and the recent state-of-the-art few-shot human motion prediction method GraphHetNet drumond2023few on the CMU Mocap dataset.
  • Figure 4: Qualitative results of our framework and the recent state-of-the-art few-shot human motion prediction method GraphHetNet drumond2023few on the Human3.6M dataset.

Theorems & Definitions (1)

  • Theorem 1