Table of Contents
Fetching ...

Vision-Guided Loco-Manipulation with a Snake Robot

Adarsh Salagame, Sasank Potluri, Keshav Bharadwaj Vaidyanathan, Kruthika Gangaraju, Eric Sihite, Milad Ramezani, Alireza Ramezani

TL;DR

This work addresses autonomous loco-manipulation for a highly articulated snake robot (COBRA) by fusing onboard vision and control to detect, estimate the pose, and manipulate objects at several meters distance. The authors implement a vision-guided pipeline that combines on-board YOLOv8 Nano-based detection, depth-assisted $6$-DOF pose estimation, and a COM-based path-tracking framework optimized via nonlinear programming to enable closed-loop docking and object transport. Key contributions include end-to-end integration of perception and control on COBRA, an evaluation comparing YOLOv8 Nano and Mask2Former for docking-module detection, and a demonstration of real-time onboard detection, pose estimation, and open-loop loco-manipulation. The results illustrate the feasibility of autonomous manipulation in confined and challenging environments, with future work focusing on perception robustness, feedback control integration, and adaptation to dynamic or cluttered settings.

Abstract

This paper presents the development and integration of a vision-guided loco-manipulation pipeline for Northeastern University's snake robot, COBRA. The system leverages a YOLOv8-based object detection model and depth data from an onboard stereo camera to estimate the 6-DOF pose of target objects in real time. We introduce a framework for autonomous detection and control, enabling closed-loop loco-manipulation for transporting objects to specified goal locations. Additionally, we demonstrate open-loop experiments in which COBRA successfully performs real-time object detection and loco-manipulation tasks.

Vision-Guided Loco-Manipulation with a Snake Robot

TL;DR

This work addresses autonomous loco-manipulation for a highly articulated snake robot (COBRA) by fusing onboard vision and control to detect, estimate the pose, and manipulate objects at several meters distance. The authors implement a vision-guided pipeline that combines on-board YOLOv8 Nano-based detection, depth-assisted -DOF pose estimation, and a COM-based path-tracking framework optimized via nonlinear programming to enable closed-loop docking and object transport. Key contributions include end-to-end integration of perception and control on COBRA, an evaluation comparing YOLOv8 Nano and Mask2Former for docking-module detection, and a demonstration of real-time onboard detection, pose estimation, and open-loop loco-manipulation. The results illustrate the feasibility of autonomous manipulation in confined and challenging environments, with future work focusing on perception robustness, feedback control integration, and adaptation to dynamic or cluttered settings.

Abstract

This paper presents the development and integration of a vision-guided loco-manipulation pipeline for Northeastern University's snake robot, COBRA. The system leverages a YOLOv8-based object detection model and depth data from an onboard stereo camera to estimate the 6-DOF pose of target objects in real time. We introduce a framework for autonomous detection and control, enabling closed-loop loco-manipulation for transporting objects to specified goal locations. Additionally, we demonstrate open-loop experiments in which COBRA successfully performs real-time object detection and loco-manipulation tasks.

Paper Structure

This paper contains 11 sections, 8 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Shows COBRA salagame_validation_2024salagame_loco-manipulation_2024salagame_reinforcement_2024salagame_how_2024 sidewinding towards target for loco-manipulation (Inset: View from onboard camera with detection of target object)
  • Figure 2: Full proposed system pipeline
  • Figure 3: (Above Right) Closeup view of the head with actuated fins. (Below Right) Docking module attached to object for loco-manipulation. (Below Left) Shows NVidia Jetson Orin NX and Intel RealSense D435i mounted in the head module
  • Figure 4: Perception system overview
  • Figure 5: Shows detection of docking module using Mask2Former (left) and Yolo-v8 (right)
  • ...and 4 more figures