Table of Contents
Fetching ...

SonicSense: Object Perception from In-Hand Acoustic Vibration

Jiaxun Liu, Boyuan Chen

TL;DR

SonicSense introduces a cost-effective, four-finger robot hand equipped with contact microphones to sense in-hand acoustic vibrations. Through a heuristic interaction policy and end-to-end models, it achieves material classification, 3D shape reconstruction, and object re-identification across 83 real-world objects, including complex geometries and heterogeneous materials. The work demonstrates robustness to ambient noise, leverages synthetic data augmentation for shape learning, and shows strong task performance with dedicated datasets and evaluation. This holistic approach advances tactile perception in robotics by moving beyond small, controlled object sets to diverse, real-world scenarios, enabling richer object understanding from acoustic cues.

Abstract

We introduce SonicSense, a holistic design of hardware and software to enable rich robot object perception through in-hand acoustic vibration sensing. While previous studies have shown promising results with acoustic sensing for object perception, current solutions are constrained to a handful of objects with simple geometries and homogeneous materials, single-finger sensing, and mixing training and testing on the same objects. SonicSense enables container inventory status differentiation, heterogeneous material prediction, 3D shape reconstruction, and object re-identification from a diverse set of 83 real-world objects. Our system employs a simple but effective heuristic exploration policy to interact with the objects as well as end-to-end learning-based algorithms to fuse vibration signals to infer object properties. Our framework underscores the significance of in-hand acoustic vibration sensing in advancing robot tactile perception.

SonicSense: Object Perception from In-Hand Acoustic Vibration

TL;DR

SonicSense introduces a cost-effective, four-finger robot hand equipped with contact microphones to sense in-hand acoustic vibrations. Through a heuristic interaction policy and end-to-end models, it achieves material classification, 3D shape reconstruction, and object re-identification across 83 real-world objects, including complex geometries and heterogeneous materials. The work demonstrates robustness to ambient noise, leverages synthetic data augmentation for shape learning, and shows strong task performance with dedicated datasets and evaluation. This holistic approach advances tactile perception in robotics by moving beyond small, controlled object sets to diverse, real-world scenarios, enabling richer object understanding from acoustic cues.

Abstract

We introduce SonicSense, a holistic design of hardware and software to enable rich robot object perception through in-hand acoustic vibration sensing. While previous studies have shown promising results with acoustic sensing for object perception, current solutions are constrained to a handful of objects with simple geometries and homogeneous materials, single-finger sensing, and mixing training and testing on the same objects. SonicSense enables container inventory status differentiation, heterogeneous material prediction, 3D shape reconstruction, and object re-identification from a diverse set of 83 real-world objects. Our system employs a simple but effective heuristic exploration policy to interact with the objects as well as end-to-end learning-based algorithms to fuse vibration signals to infer object properties. Our framework underscores the significance of in-hand acoustic vibration sensing in advancing robot tactile perception.

Paper Structure

This paper contains 17 sections, 10 equations, 17 figures, 11 tables.

Figures (17)

  • Figure 1: SonicSense enables container inventory status differentiation, heterogeneous material prediction, 3D shape reconstruction, and object re-identification on a diverse set of $83$ real-world objects.
  • Figure 2: Our robot hand includes four fingers where each fingertip is equipped with one contact microphone and a counterweight.
  • Figure 3: (A) 83 real-world objects: 54 everyday objects and 29 3D-printed primitive objects with different materials attached to their surfaces. (B) The composition of the nine materials and multi-material vs. single-material objects.
  • Figure 4: The network architectures (A) Our material classification network takes in one Mel-spectrogram A from one tapping position through several convolutional and MLP layers and outputs the material label $m$. (B) Our shape reconstruction network takes in a set of sparse contact points $C$ through the Point Completion Network (PCN) encoder and MLP layers to output a dense and completed object point cloud $P$. (C) Our object re-identification model takes in both Mel-spectrogram A with 15 channels through several convolutional layers and their corresponding fifteen contact positions $C$ with PCN encoders. After fusing the features from the two networks through several MLP layers, our model outputs the final object label $O$.
  • Figure 5: We conducted synthetic data collection of contact points on a large number of 3D objects in the simulation for data augmentation.
  • ...and 12 more figures