AeroGrab: A Unified Framework for Aerial Grasping in Cluttered Environments

Shivansh Pratap Singh; Naveen Sudheer Nair; Samaksh Ujjawal; Sarthak Mishra; Soham Patil; Rishabh Dev Yadav; Spandan Roy

AeroGrab: A Unified Framework for Aerial Grasping in Cluttered Environments

Shivansh Pratap Singh, Naveen Sudheer Nair, Samaksh Ujjawal, Sarthak Mishra, Soham Patil, Rishabh Dev Yadav, Spandan Roy

Abstract

Reliable aerial grasping in cluttered environments remains challenging due to occlusions and collision risks. Existing aerial manipulation pipelines largely rely on centroid-based grasping and lack integration between the grasp pose generation models, active exploration, and language-level task specification, resulting in the absence of a complete end-to-end system. In this work, we present an integrated pipeline for reliable aerial grasping in cluttered environments. Given a scene and a language instruction, the system identifies the target object and actively explores it to gain better views of the object. During exploration, a grasp generation network predicts multiple 6-DoF grasp candidates for each view. Each candidate is evaluated using a collision-aware feasibility framework, and the overall best grasp is selected and executed using standard trajectory generation and control methods. Experiments in cluttered real-world scenarios demonstrate robust and reliable grasp execution, highlighting the effectiveness of combining active perception with feasibility-aware grasp selection for aerial manipulation.

AeroGrab: A Unified Framework for Aerial Grasping in Cluttered Environments

Abstract

Paper Structure (21 sections, 6 equations, 5 figures, 3 tables, 1 algorithm)

This paper contains 21 sections, 6 equations, 5 figures, 3 tables, 1 algorithm.

Introduction
Related Work and Contribution
Methodology
Language-Guided Target Localization
Target Synthesis
Collision Feasibility Filtering
Batched Trajectory Collision Evaluation
Decision Logic and Grasp Execution
Active Re-Positioning (Unsafe Grasp)
Approach and Execution (Safe Grasp)
Control and Trajectory Execution
Experiments
Experimental Platform
Simulation Environment and Setup
Scenario Definitions and Task Procedures
...and 6 more sections

Figures (5)

Figure 1: Architecture Overview. A unified aerial grasping framework that tightly couples language-guided semantic perception with platform-aware kinematic and collision constraints, enabling safe interaction in a cluttered environment.
Figure 2: Unified perception, planning, and control pipeline for aerial grasping in clutter. The framework connects language grounding, active perception, 6-DoF grasp generation, platform-aware feasibility analysis, and collision-safe execution.
Figure 3: Experiment Scenarios. (a) Tabletop clutter, (b) window-constrained access, (c) shelf reachability test. For each scenario, we show the scene view, trajectory, and successful grasp.
Figure 4: System Hardware Overview. Custom 450 mm quadrotor platform featuring an Orin Nano Super, ZED 2i stereo camera, and a 4-DoF manipulator.
Figure 5: Simulation Setup Overview.

AeroGrab: A Unified Framework for Aerial Grasping in Cluttered Environments

Abstract

AeroGrab: A Unified Framework for Aerial Grasping in Cluttered Environments

Authors

Abstract

Table of Contents

Figures (5)