Table of Contents
Fetching ...

Omni-Manip: Beyond-FOV Large-Workspace Humanoid Manipulation with Omnidirectional 3D Perception

Pei Qu, Zheng Li, Yufei Jia, Ziyun Liu, Liang Zhu, Haoang Li, Jinni Zhou, Jun Ma

TL;DR

The proposed Omni-Manip is an end-to-end LiDAR-driven 3D visuomotor policy that enables robust manipulation in large workspaces and achieves robust performance in large-workspace and cluttered scenarios, outperforming baselines that rely on egocentric depth cameras.

Abstract

The deployment of humanoid robots for dexterous manipulation in unstructured environments remains challenging due to perceptual limitations that constrain the effective workspace. In scenarios where physical constraints prevent the robot from repositioning itself, maintaining omnidirectional awareness becomes far more critical than color or semantic information. While recent advances in visuomotor policy learning have improved manipulation capabilities, conventional RGB-D solutions suffer from narrow fields of view (FOV) and self-occlusion, requiring frequent base movements that introduce motion uncertainty and safety risks. Existing approaches to expanding perception, including active vision systems and third-view cameras, introduce mechanical complexity, calibration dependencies, and latency that hinder reliable real-time performance. In this work, We propose Omni-Manip, an end-to-end LiDAR-driven 3D visuomotor policy that enables robust manipulation in large workspaces. Our method processes panoramic point clouds through a Time-Aware Attention Pooling mechanism, efficiently encoding sparse 3D data while capturing temporal dependencies. This 360° perception allows the robot to interact with objects across wide areas without frequent repositioning. To support policy learning, we develop a whole-body teleoperation system for efficient data collection on full-body coordination. Extensive experiments in simulation and real-world environments show that Omni-Manip achieves robust performance in large-workspace and cluttered scenarios, outperforming baselines that rely on egocentric depth cameras.

Omni-Manip: Beyond-FOV Large-Workspace Humanoid Manipulation with Omnidirectional 3D Perception

TL;DR

The proposed Omni-Manip is an end-to-end LiDAR-driven 3D visuomotor policy that enables robust manipulation in large workspaces and achieves robust performance in large-workspace and cluttered scenarios, outperforming baselines that rely on egocentric depth cameras.

Abstract

The deployment of humanoid robots for dexterous manipulation in unstructured environments remains challenging due to perceptual limitations that constrain the effective workspace. In scenarios where physical constraints prevent the robot from repositioning itself, maintaining omnidirectional awareness becomes far more critical than color or semantic information. While recent advances in visuomotor policy learning have improved manipulation capabilities, conventional RGB-D solutions suffer from narrow fields of view (FOV) and self-occlusion, requiring frequent base movements that introduce motion uncertainty and safety risks. Existing approaches to expanding perception, including active vision systems and third-view cameras, introduce mechanical complexity, calibration dependencies, and latency that hinder reliable real-time performance. In this work, We propose Omni-Manip, an end-to-end LiDAR-driven 3D visuomotor policy that enables robust manipulation in large workspaces. Our method processes panoramic point clouds through a Time-Aware Attention Pooling mechanism, efficiently encoding sparse 3D data while capturing temporal dependencies. This 360° perception allows the robot to interact with objects across wide areas without frequent repositioning. To support policy learning, we develop a whole-body teleoperation system for efficient data collection on full-body coordination. Extensive experiments in simulation and real-world environments show that Omni-Manip achieves robust performance in large-workspace and cluttered scenarios, outperforming baselines that rely on egocentric depth cameras.
Paper Structure (14 sections, 2 equations, 6 figures, 4 tables)

This paper contains 14 sections, 2 equations, 6 figures, 4 tables.

Figures (6)

  • Figure 1: Omni-Manip. (a) The narrow field of view of RGB-D cameras prevents perception of objects outside visual range, causing task failures and collisions in space-constrained environments where robot repositioning is difficult; (b) by utilizing panoramic 3D LiDAR perception, our approach empowers humanoid robots with the capability to perform manipulation tasks across a large workspace, including areas outside the camera's visual blind spots.
  • Figure 2: Overview of system architecture. Our system consists of four key components: (i) the panoramic perception hardware platform based on a Unitree G1 humanoid robot equipped with a Mid-360 LiDAR; (ii) a whole-body teleoperation system for efficient demonstration data collection; (iii) an end-to-end visuomotor policy learning method leveraging omnidirectional 3D perception with time-aware encoding; and (iv) real-world deployment enabling large-workspace manipulation and omnidirectional obstacle avoidance.
  • Figure 3: Comparison of the field of views of LiDAR and camera. (a) The panoramic LiDAR provides 360° horizontal coverage, significantly surpassing the narrow field of view of the head-mounted RGB-D camera. (b) This enables the robot to perceive global scene context and locate target objects even when they lie outside the camera's visual blind spots, which is essential for large-workspace manipulation tasks.
  • Figure 4: Simulation experiments and snapshots. We evaluate Omni-Manip on two simulated tasks, demonstrating its effectiveness in large-workspace manipulation scenarios with panoramic perception.
  • Figure 5: Real-world experiments and snapshots. We further validate Omni-Manip on four real-world tasks, confirming its robust performance in real-world environments and out-of-view manipulation scenarios.
  • ...and 1 more figures