Optimizing Grasping in Legged Robots: A Deep Learning Approach to Loco-Manipulation
Dilermando Almeida, Guilherme Lazzarini, Juliano Negri, Thiago H. Segreto, Ricardo V. Godoy, Marcelo Becker
TL;DR
This work tackles robust grasping for loco-manipulation in quadruped robots by building a sim-to-real pipeline. It generates a large synthetic RGB-D grasp dataset in Genesis, trains a U-Net–style CNN to produce pixel-wise grasp-quality heatmaps from multi-modal inputs, and validates the approach on a real Spot robot with a manipulable end-effector. The key contributions include synthetic data creation with per-pixel grasp labels, a multi-modal grasp predictor, and a complete deployment pipeline from perception to manipulation. The approach demonstrates scalable, autonomous object handling in unstructured environments, with implications for rescue, logistics, and domestic robotics.
Abstract
This paper presents a deep learning framework designed to enhance the grasping capabilities of quadrupeds equipped with arms, with a focus on improving precision and adaptability. Our approach centers on a sim-to-real methodology that minimizes reliance on physical data collection. We developed a pipeline within the Genesis simulation environment to generate a synthetic dataset of grasp attempts on common objects. By simulating thousands of interactions from various perspectives, we created pixel-wise annotated grasp-quality maps to serve as the ground truth for our model. This dataset was used to train a custom CNN with a U-Net-like architecture that processes multi-modal input from an onboard RGB and depth cameras, including RGB images, depth maps, segmentation masks, and surface normal maps. The trained model outputs a grasp-quality heatmap to identify the optimal grasp point. We validated the complete framework on a four-legged robot. The system successfully executed a full loco-manipulation task: autonomously navigating to a target object, perceiving it with its sensors, predicting the optimal grasp pose using our model, and performing a precise grasp. This work proves that leveraging simulated training with advanced sensing offers a scalable and effective solution for object handling.
