Towards Motion Turing Test: Evaluating Human-Likeness in Humanoid Robots

Mingzhe Li; Mengyin Liu; Zekai Wu; Xincheng Lin; Junsheng Zhang; Ming Yan; Zengye Xie; Changwang Zhang; Chenglu Wen; Lan Xu; Siqi Shen; Cheng Wang

Towards Motion Turing Test: Evaluating Human-Likeness in Humanoid Robots

Mingzhe Li, Mengyin Liu, Zekai Wu, Xincheng Lin, Junsheng Zhang, Ming Yan, Zengye Xie, Changwang Zhang, Chenglu Wen, Lan Xu, Siqi Shen, Cheng Wang

TL;DR

The Motion Turing Test is proposed, a framework that evaluates whether human observers can discriminate between humanoid robot and human poses using only kinematic information, and a simple baseline model is proposed that outperforms several contemporary LLM-based methods.

Abstract

Humanoid robots have achieved significant progress in motion generation and control, exhibiting movements that appear increasingly natural and human-like. Inspired by the Turing Test, we propose the Motion Turing Test, a framework that evaluates whether human observers can discriminate between humanoid robot and human poses using only kinematic information. To facilitate this evaluation, we present the Human-Humanoid Motion (HHMotion) dataset, which consists of 1,000 motion sequences spanning 15 action categories, performed by 11 humanoid models and 10 human subjects. All motion sequences are converted into SMPL-X representations to eliminate the influence of visual appearance. We recruited 30 annotators to rate the human-likeness of each pose on a 0-5 scale, resulting in over 500 hours of annotation. Analysis of the collected data reveals that humanoid motions still exhibit noticeable deviations from human movements, particularly in dynamic actions such as jumping, boxing, and running. Building on HHMotion, we formulate a human-likeness evaluation task that aims to automatically predict human-likeness scores from motion data. Despite recent progress in multimodal large language models, we find that they remain inadequate for assessing motion human-likeness. To address this, we propose a simple baseline model and demonstrate that it outperforms several contemporary LLM-based methods. The dataset, code, and benchmark will be publicly released to support future research in the community.

Towards Motion Turing Test: Evaluating Human-Likeness in Humanoid Robots

TL;DR

Abstract

Paper Structure (28 sections, 2 equations, 10 figures, 4 tables)

This paper contains 28 sections, 2 equations, 10 figures, 4 tables.

Introduction
Related Work
Human and Humanoid Robot Pose Estimation
Human and Humanoid Motion Generation
Motion Datasets
Human-humanoid Motion Dataset
Dataset Overview
Dataset Construction
Humanoid robot data
Human motion data
Human-humanoid Pose Estimation
Human-likeness Scoring
Dataset Quality
Analysis of Human-likness Scores
Robot Motion Turing Test Benchmark
...and 13 more sections

Figures (10)

Figure 1: Motion Turing Test: Evaluators judge whether the pose sequence resembles human motion, focusing solely on motion without appearance cues.
Figure 2: Action sources, types, and category distribution in the HHMotion dataset, illustrating the diverse actions of both humans and humanoid robots.
Figure 3: Overview of the human scoring pipeline, where all the humanoid robot and human motions are converted into SMPL-X poses and evaluated by human annotators. The resulting 0–5 scores quantitatively assess the human-likeness of each motion.
Figure 4: Human-likeness scoring rules used in evaluating motion clips on a 0–5 Likert scale, focusing solely on motion quality.
Figure 5: Overall distribution of motion human-likeness scores for human and humanoid motions (left) and human-likeness scores for humanoid in simulation and real scenarios (right).
...and 5 more figures

Towards Motion Turing Test: Evaluating Human-Likeness in Humanoid Robots

TL;DR

Abstract

Towards Motion Turing Test: Evaluating Human-Likeness in Humanoid Robots

Authors

TL;DR

Abstract

Table of Contents

Figures (10)