Table of Contents
Fetching ...

SignBot: Learning Human-to-Humanoid Sign Language Interaction

Guanren Qiao, Sixu Lin, Ronglai Zuo, Zhizheng Wu, Kui Jia, Guiliang Liu

TL;DR

This work presents SignBot, a three-stage framework for human-to-humanoid sign-language interaction that marries body-motion retargeting with a cerebellum-inspired control policy and a cerebral module for translation, response, and sign-language generation. The system uses decoupled upper/lower-body control, sim-to-real training with domain randomization, and a SMPL-X-based generation pipeline to produce natural, sign-language-aligned robot motions. Extensive simulation and real-world evaluations across multiple datasets and robot embodiments demonstrate high accuracy, strong generalization, natural interaction, and positive user feedback from the DHH community, marking a significant step toward accessible, embodied sign-language robots. Future work aims to add facial expressions and field data to capture regional sign variations and further improve realism and inclusivity.

Abstract

Sign language is a natural and visual form of language that uses movements and expressions to convey meaning, serving as a crucial means of communication for individuals who are deaf or hard-of-hearing (DHH). However, the number of people proficient in sign language remains limited, highlighting the need for technological advancements to bridge communication gaps and foster interactions with minorities. Based on recent advancements in embodied humanoid robots, we propose SignBot, a novel framework for human-robot sign language interaction. SignBot integrates a cerebellum-inspired motion control component and a cerebral-oriented module for comprehension and interaction. Specifically, SignBot consists of: 1) Motion Retargeting, which converts human sign language datasets into robot-compatible kinematics; 2) Motion Control, which leverages a learning-based paradigm to develop a robust humanoid control policy for tracking sign language gestures; and 3) Generative Interaction, which incorporates translator, responser, and generator of sign language, thereby enabling natural and effective communication between robots and humans. Simulation and real-world experimental results demonstrate that SignBot can effectively facilitate human-robot interaction and perform sign language motions with diverse robots and datasets. SignBot represents a significant advancement in automatic sign language interaction on embodied humanoid robot platforms, providing a promising solution to improve communication accessibility for the DHH community.

SignBot: Learning Human-to-Humanoid Sign Language Interaction

TL;DR

This work presents SignBot, a three-stage framework for human-to-humanoid sign-language interaction that marries body-motion retargeting with a cerebellum-inspired control policy and a cerebral module for translation, response, and sign-language generation. The system uses decoupled upper/lower-body control, sim-to-real training with domain randomization, and a SMPL-X-based generation pipeline to produce natural, sign-language-aligned robot motions. Extensive simulation and real-world evaluations across multiple datasets and robot embodiments demonstrate high accuracy, strong generalization, natural interaction, and positive user feedback from the DHH community, marking a significant step toward accessible, embodied sign-language robots. Future work aims to add facial expressions and field data to capture regional sign variations and further improve realism and inclusivity.

Abstract

Sign language is a natural and visual form of language that uses movements and expressions to convey meaning, serving as a crucial means of communication for individuals who are deaf or hard-of-hearing (DHH). However, the number of people proficient in sign language remains limited, highlighting the need for technological advancements to bridge communication gaps and foster interactions with minorities. Based on recent advancements in embodied humanoid robots, we propose SignBot, a novel framework for human-robot sign language interaction. SignBot integrates a cerebellum-inspired motion control component and a cerebral-oriented module for comprehension and interaction. Specifically, SignBot consists of: 1) Motion Retargeting, which converts human sign language datasets into robot-compatible kinematics; 2) Motion Control, which leverages a learning-based paradigm to develop a robust humanoid control policy for tracking sign language gestures; and 3) Generative Interaction, which incorporates translator, responser, and generator of sign language, thereby enabling natural and effective communication between robots and humans. Simulation and real-world experimental results demonstrate that SignBot can effectively facilitate human-robot interaction and perform sign language motions with diverse robots and datasets. SignBot represents a significant advancement in automatic sign language interaction on embodied humanoid robot platforms, providing a promising solution to improve communication accessibility for the DHH community.

Paper Structure

This paper contains 13 sections, 3 equations, 7 figures, 8 tables.

Figures (7)

  • Figure 1: Motivation of SignBot: Human-Robot Sign Language Interaction.
  • Figure 2: Overview of SignBot: The framework consists of three stages: (1) Motion Retargeting aligns human sign language gestures with the body structure of humanoid robots (Section \ref{['sec:motion_retargeting']}). In addition, we use the processed mesh along with text labels to train the sign language generator model. (2) Cerebellum performs Sim2Real policy training that enables the robot to track various sign language gestures in the simulated environment and deploy the policy to real-world (Section \ref{['sec:cerebellar']}). (3) Cerebral conducts sign language reasoning to facilitate communication with sign language users through the sign language translator, response, and generator within the cerebral (Section \ref{['sec:brain']}).
  • Figure 3: Sign Language Alignment between Human and Robots: We display the source video of human sign language in the first row, followed by the mesh from video processing, and the last row shows the results of different robots. For the H1 robot, the red nodes represent the robot dof pos, while the green nodes represent the retargeted demonstration nodes.
  • Figure 4: An example of real-world interaction between the robot and the human customer.
  • Figure 5: Experiment Preparation: The above shows the environment for loading the H1/W1 robot in IsaacGym, with the H1/W1 robot in the lower left and the Linker hand dexterous hand in the lower right.
  • ...and 2 more figures