SpikeGrasp: A Benchmark for 6-DoF Grasp Pose Detection from Stereo Spike Streams
Zhuoheng Gao, Jiyao Zhang, Zhiyong Xie, Hao Dong, Zhaofei Yu, Rongmei Chen, Guozhang Chen, Tiejun Huang
TL;DR
SpikeGrasp presents a neuro-inspired framework for 6-DoF grasp pose detection from raw stereo spike streams, bypassing point-cloud reconstruction. It combines a Visual Pathway Network with a recurrent spiking neural network to iteratively refine a latent grasp-affordance state, followed by Graspable and Grasp Detection networks that output 6-DoF grasps. A large-scale synthetic spike-stream dataset supports end-to-end training and evaluation, demonstrating strong data efficiency and competitive or superior performance in cluttered and textureless scenes, with promising sim-to-real transfer. This work highlights the potential of neuromorphic, spike-based perception for fast, robust robotic manipulation in dynamic environments.
Abstract
Most robotic grasping systems rely on converting sensor data into explicit 3D point clouds, which is a computational step not found in biological intelligence. This paper explores a fundamentally different, neuro-inspired paradigm for 6-DoF grasp detection. We introduce SpikeGrasp, a framework that mimics the biological visuomotor pathway, processing raw, asynchronous events from stereo spike cameras, similarly to retinas, to directly infer grasp poses. Our model fuses these stereo spike streams and uses a recurrent spiking neural network, analogous to high-level visual processing, to iteratively refine grasp hypotheses without ever reconstructing a point cloud. To validate this approach, we built a large-scale synthetic benchmark dataset. Experiments show that SpikeGrasp surpasses traditional point-cloud-based baselines, especially in cluttered and textureless scenes, and demonstrates remarkable data efficiency. By establishing the viability of this end-to-end, neuro-inspired approach, SpikeGrasp paves the way for future systems capable of the fluid and efficient manipulation seen in nature, particularly for dynamic objects.
