Table of Contents
Fetching ...

AeroGrab: A Unified Framework for Aerial Grasping in Cluttered Environments

Shivansh Pratap Singh, Naveen Sudheer Nair, Samaksh Ujjawal, Sarthak Mishra, Soham Patil, Rishabh Dev Yadav, Spandan Roy

Abstract

Reliable aerial grasping in cluttered environments remains challenging due to occlusions and collision risks. Existing aerial manipulation pipelines largely rely on centroid-based grasping and lack integration between the grasp pose generation models, active exploration, and language-level task specification, resulting in the absence of a complete end-to-end system. In this work, we present an integrated pipeline for reliable aerial grasping in cluttered environments. Given a scene and a language instruction, the system identifies the target object and actively explores it to gain better views of the object. During exploration, a grasp generation network predicts multiple 6-DoF grasp candidates for each view. Each candidate is evaluated using a collision-aware feasibility framework, and the overall best grasp is selected and executed using standard trajectory generation and control methods. Experiments in cluttered real-world scenarios demonstrate robust and reliable grasp execution, highlighting the effectiveness of combining active perception with feasibility-aware grasp selection for aerial manipulation.

AeroGrab: A Unified Framework for Aerial Grasping in Cluttered Environments

Abstract

Reliable aerial grasping in cluttered environments remains challenging due to occlusions and collision risks. Existing aerial manipulation pipelines largely rely on centroid-based grasping and lack integration between the grasp pose generation models, active exploration, and language-level task specification, resulting in the absence of a complete end-to-end system. In this work, we present an integrated pipeline for reliable aerial grasping in cluttered environments. Given a scene and a language instruction, the system identifies the target object and actively explores it to gain better views of the object. During exploration, a grasp generation network predicts multiple 6-DoF grasp candidates for each view. Each candidate is evaluated using a collision-aware feasibility framework, and the overall best grasp is selected and executed using standard trajectory generation and control methods. Experiments in cluttered real-world scenarios demonstrate robust and reliable grasp execution, highlighting the effectiveness of combining active perception with feasibility-aware grasp selection for aerial manipulation.
Paper Structure (21 sections, 6 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 21 sections, 6 equations, 5 figures, 3 tables, 1 algorithm.

Figures (5)

  • Figure 1: Architecture Overview. A unified aerial grasping framework that tightly couples language-guided semantic perception with platform-aware kinematic and collision constraints, enabling safe interaction in a cluttered environment.
  • Figure 2: Unified perception, planning, and control pipeline for aerial grasping in clutter. The framework connects language grounding, active perception, 6-DoF grasp generation, platform-aware feasibility analysis, and collision-safe execution.
  • Figure 3: Experiment Scenarios. (a) Tabletop clutter, (b) window-constrained access, (c) shelf reachability test. For each scenario, we show the scene view, trajectory, and successful grasp.
  • Figure 4: System Hardware Overview. Custom 450 mm quadrotor platform featuring an Orin Nano Super, ZED 2i stereo camera, and a 4-DoF manipulator.
  • Figure 5: Simulation Setup Overview.