Table of Contents
Fetching ...

ARMOR: Egocentric Perception for Humanoid Robot Collision Avoidance and Motion Planning

Daehwa Kim, Mario Srouji, Chen Chen, Jian Zhang

TL;DR

Humanoid robots suffer from sensing gaps and occlusions during manipulation in dense environments. The paper proposes ARMOR, an egocentric perception system using distributed ToF depth sensors and a transformer-based imitation-learning planner called ARMOR-Policy, trained on ~86 hours of AMASS data and optimized with inference-time trajectory sampling. The approach yields large improvements over exocentric sensing and traditional planners in simulation and real-world GR1 deployment, achieving up to 63.7% fewer collisions and 78.7% higher success, with markedly lower latency than sampling-based methods. This work demonstrates the feasibility and benefits of occlusion-free, wearable sensing for fast, safe humanoid robot motion planning and provides a path toward broader dexterous manipulation in real environments.

Abstract

Humanoid robots have significant gaps in their sensing and perception, making it hard to perform motion planning in dense environments. To address this, we introduce ARMOR, a novel egocentric perception system that integrates both hardware and software, specifically incorporating wearable-like depth sensors for humanoid robots. Our distributed perception approach enhances the robot's spatial awareness, and facilitates more agile motion planning. We also train a transformer-based imitation learning (IL) policy in simulation to perform dynamic collision avoidance, by leveraging around 86 hours worth of human realistic motions from the AMASS dataset. We show that our ARMOR perception is superior against a setup with multiple dense head-mounted, and externally mounted depth cameras, with a 63.7% reduction in collisions, and 78.7% improvement on success rate. We also compare our IL policy against a sampling-based motion planning expert cuRobo, showing 31.6% less collisions, 16.9% higher success rate, and 26x reduction in computational latency. Lastly, we deploy our ARMOR perception on our real-world GR1 humanoid from Fourier Intelligence. We are going to update the link to the source code, HW description, and 3D CAD files in the arXiv version of this text.

ARMOR: Egocentric Perception for Humanoid Robot Collision Avoidance and Motion Planning

TL;DR

Humanoid robots suffer from sensing gaps and occlusions during manipulation in dense environments. The paper proposes ARMOR, an egocentric perception system using distributed ToF depth sensors and a transformer-based imitation-learning planner called ARMOR-Policy, trained on ~86 hours of AMASS data and optimized with inference-time trajectory sampling. The approach yields large improvements over exocentric sensing and traditional planners in simulation and real-world GR1 deployment, achieving up to 63.7% fewer collisions and 78.7% higher success, with markedly lower latency than sampling-based methods. This work demonstrates the feasibility and benefits of occlusion-free, wearable sensing for fast, safe humanoid robot motion planning and provides a path toward broader dexterous manipulation in real environments.

Abstract

Humanoid robots have significant gaps in their sensing and perception, making it hard to perform motion planning in dense environments. To address this, we introduce ARMOR, a novel egocentric perception system that integrates both hardware and software, specifically incorporating wearable-like depth sensors for humanoid robots. Our distributed perception approach enhances the robot's spatial awareness, and facilitates more agile motion planning. We also train a transformer-based imitation learning (IL) policy in simulation to perform dynamic collision avoidance, by leveraging around 86 hours worth of human realistic motions from the AMASS dataset. We show that our ARMOR perception is superior against a setup with multiple dense head-mounted, and externally mounted depth cameras, with a 63.7% reduction in collisions, and 78.7% improvement on success rate. We also compare our IL policy against a sampling-based motion planning expert cuRobo, showing 31.6% less collisions, 16.9% higher success rate, and 26x reduction in computational latency. Lastly, we deploy our ARMOR perception on our real-world GR1 humanoid from Fourier Intelligence. We are going to update the link to the source code, HW description, and 3D CAD files in the arXiv version of this text.

Paper Structure

This paper contains 15 sections, 1 equation, 7 figures, 1 table.

Figures (7)

  • Figure 1: ARMOR presents a novel egocentric wearable perception hardware and software system for humanoid robots (left). Low-profile and distributed depth sensors enable comprehensive point cloud perception around the robot, and minimize occlusions (right). With a data-driven motion planning policy, ARMOR-Policy, we are able to steer attention to specific regions, and demonstrate effective and fast motion planning.
  • Figure 2: ARMOR's egocentric perception hardware in simulation (left), and deployed on the real robot (right).
  • Figure 3: ARMOR-Policy's neural motion planner network architecture. Left: The Behavior Encoder compresses action sequences into style variable $z$, which is later used for diverse output sampling. Right: We implemented the policy decoder to take depth images as input. The depth image is in the lidar camera frame.
  • Figure 4: Three data generation strategies. In collision-avoidance motion, a 1-second sequence of human motion in the AMASS dataset is used as a motion planning expert, and the obstacles are placed tightly around, but not colliding with, the motion trajectory. In an emergency stop, the goal pose is randomly chosen inside of the obstacle location. In collision-free motion, we remove all obstacles and linearly interpolate the trajectory from the initial pose to the goal.
  • Figure 5: Experiment setup. The yellow geometries indicate depth cameras, Intel RealSense D435 (Exocentric) and VL35L5CX ToF sensor (Egocentric).
  • ...and 2 more figures