Table of Contents
Fetching ...

GPT-Connect: Interaction between Text-Driven Human Motion Generator and 3D Scenes in a Training-free Manner

Haoxuan Qu, Ziyan Guo, Jun Liu

TL;DR

In this paper, a novel GPT-connect framework is proposed, which enables scene-aware motion sequences to be generated directly utilizing the existing blank-background human motion generator, via leveraging ChatGPT to connect the existing motion generator with the 3D scene in a totally training-free manner.

Abstract

Recently, while text-driven human motion generation has received massive research attention, most existing text-driven motion generators are generally only designed to generate motion sequences in a blank background. While this is the case, in practice, human beings naturally perform their motions in 3D scenes, rather than in a blank background. Considering this, we here aim to perform scene-aware text-drive motion generation instead. Yet, intuitively training a separate scene-aware motion generator in a supervised way can require a large amount of motion samples to be troublesomely collected and annotated in a large scale of different 3D scenes. To handle this task rather in a relatively convenient manner, in this paper, we propose a novel GPT-connect framework. In GPT-connect, we enable scene-aware motion sequences to be generated directly utilizing the existing blank-background human motion generator, via leveraging ChatGPT to connect the existing motion generator with the 3D scene in a totally training-free manner. Extensive experiments demonstrate the efficacy and generalizability of our proposed framework.

GPT-Connect: Interaction between Text-Driven Human Motion Generator and 3D Scenes in a Training-free Manner

TL;DR

In this paper, a novel GPT-connect framework is proposed, which enables scene-aware motion sequences to be generated directly utilizing the existing blank-background human motion generator, via leveraging ChatGPT to connect the existing motion generator with the 3D scene in a totally training-free manner.

Abstract

Recently, while text-driven human motion generation has received massive research attention, most existing text-driven motion generators are generally only designed to generate motion sequences in a blank background. While this is the case, in practice, human beings naturally perform their motions in 3D scenes, rather than in a blank background. Considering this, we here aim to perform scene-aware text-drive motion generation instead. Yet, intuitively training a separate scene-aware motion generator in a supervised way can require a large amount of motion samples to be troublesomely collected and annotated in a large scale of different 3D scenes. To handle this task rather in a relatively convenient manner, in this paper, we propose a novel GPT-connect framework. In GPT-connect, we enable scene-aware motion sequences to be generated directly utilizing the existing blank-background human motion generator, via leveraging ChatGPT to connect the existing motion generator with the 3D scene in a totally training-free manner. Extensive experiments demonstrate the efficacy and generalizability of our proposed framework.
Paper Structure (12 sections, 6 equations, 3 figures, 3 tables)

This paper contains 12 sections, 6 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Illustration of the scene-aware motion sequences generated by our GPT-Connect framework in different 3D scenes and based on different text prompts, in a totally training-free manner. As time passes, human meshes in the motion sequence are gradually changed from light to dark colors.
  • Figure 2: Illustration of the process of describing $S_{3D}$ in a format that is understandable to ChatGPT.
  • Figure 3: Qualitative results of our framework. (a-d) are on the 3D scenes in HUMANISE, while (e-f) are on outdoor 3D scenes outside HUMANISE. Light to dark colors of the human meshes denote time. More qualitative results are in the supplementary.