An Investigation of Multi-feature Extraction and Super-resolution with Fast Microphone Arrays
Eric T. Chang, Runsheng Wang, Peter Ballentine, Jingxi Xu, Trey Smith, Brian Coltin, Ioannis Kymissis, Matei Ciocarlie
TL;DR
This work demonstrates that a sparse MEMS microphone array embedded under a PDMS layer can support multiple tactile tasks—texture classification, contact localization, and drag velocity estimation—using a transformer-based time-series analysis framework. By operating on short time windows of high-rate microphone data, the method achieves 77.3% texture accuracy (84.2% excluding the slowest velocity), 1.8 mm localization error, and about 5.6 mm/s velocity error, while exhibiting robustness to unseen velocities. The study also shows fast contact detection with average response times in the low-millisecond range, highlighting the potential of MEMS microphone arrays as a low-cost, space-efficient tactile modality that can complement other sensing modalities. Overall, the findings inform sensor design by illustrating what tactile information can be extracted from a sparse microphone network and how data-driven, time-series methods enable such capabilities.
Abstract
In this work, we use MEMS microphones as vibration sensors to simultaneously classify texture and estimate contact position and velocity. Vibration sensors are an important facet of both human and robotic tactile sensing, providing fast detection of contact and onset of slip. Microphones are an attractive option for implementing vibration sensing as they offer a fast response and can be sampled quickly, are affordable, and occupy a very small footprint. Our prototype sensor uses only a sparse array (8-9 mm spacing) of distributed MEMS microphones (<$1, 3.76 x 2.95 x 1.10 mm) embedded under an elastomer. We use transformer-based architectures for data analysis, taking advantage of the microphones' high sampling rate to run our models on time-series data as opposed to individual snapshots. This approach allows us to obtain 77.3% average accuracy on 4-class texture classification (84.2% when excluding the slowest drag velocity), 1.8 mm mean error on contact localization, and 5.6 mm/s mean error on contact velocity. We show that the learned texture and localization models are robust to varying velocity and generalize to unseen velocities. We also report that our sensor provides fast contact detection, an important advantage of fast transducers. This investigation illustrates the capabilities one can achieve with a MEMS microphone array alone, leaving valuable sensor real estate available for integration with complementary tactile sensing modalities.
