Table of Contents
Fetching ...

VacuumVLA: Boosting VLA Capabilities via a Unified Suction and Gripping Tool for Complex Robotic Manipulation

Hui Zhou, Siyuan Huang, Minxing Li, Hao Zhang, Lue Fan, Shaoshuai Shi

TL;DR

VacuumVLA tackles the hardware bottleneck in Vision-Language-Action robotics by introducing a low-cost end-effector that combines suction and gripper capabilities. The authors integrate this hybrid hardware with two VLA frameworks (DexVLA and Pi0) to enable dual-mode manipulation and to perform four long-horizon tasks including handleless drawers and cardboard boxes. They report that the hybrid end-effector enables tasks infeasible for standard parallel grippers and demonstrate varied success rates depending on the base model, with better precision in some tasks under Pi0. The work demonstrates a practical path toward hardware-aware VLAs and releases all hardware designs and control software.

Abstract

Vision Language Action models have significantly advanced general purpose robotic manipulation by harnessing large scale pretrained vision and language representations. Among existing approaches, a majority of current VLA systems employ parallel two finger grippers as their default end effectors. However, such grippers face inherent limitations in handling certain real world tasks such as wiping glass surfaces or opening drawers without handles due to insufficient contact area or lack of adhesion. To overcome these challenges, we present a low cost, integrated hardware design that combines a mechanical two finger gripper with a vacuum suction unit, enabling dual mode manipulation within a single end effector. Our system supports flexible switching or synergistic use of both modalities, expanding the range of feasible tasks. We validate the efficiency and practicality of our design within two state of the art VLA frameworks: DexVLA and Pi0. Experimental results demonstrate that with the proposed hybrid end effector, robots can successfully perform multiple complex tasks that are infeasible for conventional two finger grippers alone. All hardware designs and controlling systems will be released.

VacuumVLA: Boosting VLA Capabilities via a Unified Suction and Gripping Tool for Complex Robotic Manipulation

TL;DR

VacuumVLA tackles the hardware bottleneck in Vision-Language-Action robotics by introducing a low-cost end-effector that combines suction and gripper capabilities. The authors integrate this hybrid hardware with two VLA frameworks (DexVLA and Pi0) to enable dual-mode manipulation and to perform four long-horizon tasks including handleless drawers and cardboard boxes. They report that the hybrid end-effector enables tasks infeasible for standard parallel grippers and demonstrate varied success rates depending on the base model, with better precision in some tasks under Pi0. The work demonstrates a practical path toward hardware-aware VLAs and releases all hardware designs and control software.

Abstract

Vision Language Action models have significantly advanced general purpose robotic manipulation by harnessing large scale pretrained vision and language representations. Among existing approaches, a majority of current VLA systems employ parallel two finger grippers as their default end effectors. However, such grippers face inherent limitations in handling certain real world tasks such as wiping glass surfaces or opening drawers without handles due to insufficient contact area or lack of adhesion. To overcome these challenges, we present a low cost, integrated hardware design that combines a mechanical two finger gripper with a vacuum suction unit, enabling dual mode manipulation within a single end effector. Our system supports flexible switching or synergistic use of both modalities, expanding the range of feasible tasks. We validate the efficiency and practicality of our design within two state of the art VLA frameworks: DexVLA and Pi0. Experimental results demonstrate that with the proposed hybrid end effector, robots can successfully perform multiple complex tasks that are infeasible for conventional two finger grippers alone. All hardware designs and controlling systems will be released.

Paper Structure

This paper contains 16 sections, 8 figures, 2 tables.

Figures (8)

  • Figure 1: Illustration of the our end-effector with integrated gripper and suction cup.
  • Figure 2: Hardware Details.
  • Figure 3: Prime actions: Adjustable gripping width for varying object sizes (first two). The remaining three primitives are based on the standard two-finger gripper.
  • Figure 4: VacuumVLA (based on DexVLA) architecture.
  • Figure 5: Weight suction test, where figures (a)–(e) correspond to five moments of the test, respectively, and the last image shows an enlarged view of the item placed on the electronic scale.
  • ...and 3 more figures