GPT-Connect: Interaction between Text-Driven Human Motion Generator and 3D Scenes in a Training-free Manner

Haoxuan Qu; Ziyan Guo; Jun Liu

GPT-Connect: Interaction between Text-Driven Human Motion Generator and 3D Scenes in a Training-free Manner

Haoxuan Qu, Ziyan Guo, Jun Liu

TL;DR

In this paper, a novel GPT-connect framework is proposed, which enables scene-aware motion sequences to be generated directly utilizing the existing blank-background human motion generator, via leveraging ChatGPT to connect the existing motion generator with the 3D scene in a totally training-free manner.

Abstract

Recently, while text-driven human motion generation has received massive research attention, most existing text-driven motion generators are generally only designed to generate motion sequences in a blank background. While this is the case, in practice, human beings naturally perform their motions in 3D scenes, rather than in a blank background. Considering this, we here aim to perform scene-aware text-drive motion generation instead. Yet, intuitively training a separate scene-aware motion generator in a supervised way can require a large amount of motion samples to be troublesomely collected and annotated in a large scale of different 3D scenes. To handle this task rather in a relatively convenient manner, in this paper, we propose a novel GPT-connect framework. In GPT-connect, we enable scene-aware motion sequences to be generated directly utilizing the existing blank-background human motion generator, via leveraging ChatGPT to connect the existing motion generator with the 3D scene in a totally training-free manner. Extensive experiments demonstrate the efficacy and generalizability of our proposed framework.

GPT-Connect: Interaction between Text-Driven Human Motion Generator and 3D Scenes in a Training-free Manner

TL;DR

Abstract

Paper Structure (12 sections, 6 equations, 3 figures, 3 tables)

This paper contains 12 sections, 6 equations, 3 figures, 3 tables.

Introduction
Related Work
Proposed Method
GPT-Generator Channel
Scene-GPT Channel
Overall Inference Process
Experiments
Dataset and Evaluation Metrics
Implementation Details
Main Results
Ablation Studies
Conclusion

Figures (3)

Figure 1: Illustration of the scene-aware motion sequences generated by our GPT-Connect framework in different 3D scenes and based on different text prompts, in a totally training-free manner. As time passes, human meshes in the motion sequence are gradually changed from light to dark colors.
Figure 2: Illustration of the process of describing $S_{3D}$ in a format that is understandable to ChatGPT.
Figure 3: Qualitative results of our framework. (a-d) are on the 3D scenes in HUMANISE, while (e-f) are on outdoor 3D scenes outside HUMANISE. Light to dark colors of the human meshes denote time. More qualitative results are in the supplementary.

GPT-Connect: Interaction between Text-Driven Human Motion Generator and 3D Scenes in a Training-free Manner

TL;DR

Abstract

GPT-Connect: Interaction between Text-Driven Human Motion Generator and 3D Scenes in a Training-free Manner

Authors

TL;DR

Abstract

Table of Contents

Figures (3)