SafeHumanoid: VLM-RAG-driven Control of Upper Body Impedance for Humanoid Robot

Yara Mahmoud; Jeffrin Sam; Nguyen Khang; Marcelino Fernando; Issatay Tokmurziyev; Miguel Altamirano Cabrera; Muhammad Haris Khan; Artem Lykov; Dzmitry Tsetserukou

SafeHumanoid: VLM-RAG-driven Control of Upper Body Impedance for Humanoid Robot

Yara Mahmoud, Jeffrin Sam, Nguyen Khang, Marcelino Fernando, Issatay Tokmurziyev, Miguel Altamirano Cabrera, Muhammad Haris Khan, Artem Lykov, Dzmitry Tsetserukou

TL;DR

SafeHumanoid presents a VLM–RAG pipeline that grounds egocentric vision into context-aware impedance and speed parameters for a humanoid robot. By retrieving validated per-joint gains and nominal velocity from a curated scenario database and applying them through an IK-based controller, the approach provides a semantic-to-safety bridge that enhances safe-human collaboration. Experiments on the Unitree G1 show task success is preserved while safety-aware modulation adapts to human presence and object fragility, though offboard latency limits responsiveness in dynamic settings. The work demonstrates a practical path toward standard-compliant, semantics-driven safety in humanoid HRI and outlines concrete avenues for latency reduction and dataset expansion.

Abstract

Safe and trustworthy Human Robot Interaction (HRI) requires robots not only to complete tasks but also to regulate impedance and speed according to scene context and human proximity. We present SafeHumanoid, an egocentric vision pipeline that links Vision Language Models (VLMs) with Retrieval-Augmented Generation (RAG) to schedule impedance and velocity parameters for a humanoid robot. Egocentric frames are processed by a structured VLM prompt, embedded and matched against a curated database of validated scenarios, and mapped to joint-level impedance commands via inverse kinematics. We evaluate the system on tabletop manipulation tasks with and without human presence, including wiping, object handovers, and liquid pouring. The results show that the pipeline adapts stiffness, damping, and speed profiles in a context-aware manner, maintaining task success while improving safety. Although current inference latency (up to 1.4 s) limits responsiveness in highly dynamic settings, SafeHumanoid demonstrates that semantic grounding of impedance control is a viable path toward safer, standard-compliant humanoid collaboration.

SafeHumanoid: VLM-RAG-driven Control of Upper Body Impedance for Humanoid Robot

TL;DR

Abstract

SafeHumanoid: VLM-RAG-driven Control of Upper Body Impedance for Humanoid Robot

TL;DR

Abstract

Paper Structure

Table of Contents

Figures (3)