Acoustic neural networks: Identifying design principles and exploring physical feasibility
Ivan Kalthoff, Marcel Rey, Raphael Wittkowski
TL;DR
This work proposes a digital-twin framework for acoustic neural networks that constrains computation to passive, non-negative acoustic operations, with weights in $[0,1]$ and activations compatible with intensity-based signals. It develops three architectures—RNNs, hierarchical subsampling RNNs (HSRNNs), and SincHSRNNs with learnable sinc front ends—and evaluates them on AudioMNIST, demonstrating that a physically realizable SincHSRNN can reach up to approximately $95\%$ test accuracy at 8 kHz while preserving interpretability through bandpass filters. The study reveals general design principles, including the benefits of non-negativity as regularization, the necessity of hierarchical temporal integration, and the value of frequency-selective preprocessing for acoustic implementations. By mapping learned parameters to measurable attenuation and transmission, the work outlines a concrete pathway toward low-power, wave-based neural computing and direct analog acoustic processing at the edge.
Abstract
Wave-guide-based physical systems provide a promising route toward energy-efficient analog computing beyond traditional electronics. Within this landscape, acoustic neural networks represent a promising approach for achieving low-power computation in environments where electronics are inefficient or limited, yet their systematic design has remained largely unexplored. Here we introduce a framework for designing and simulating acoustic neural networks, which perform computation through the propagation of sound waves. Using a digital-twin approach, we train conventional neural network architectures under physically motivated constraints including non-negative signals and weights, the absence of bias terms, and nonlinearities compatible with intensity-based, non-negative acoustic signals. Our work provides a general framework for acoustic neural networks that connects learnable network components directly to physically measurable acoustic properties, enabling the systematic design of realizable acoustic computing systems. We demonstrate that constrained recurrent and hierarchical architectures can perform accurate speech classification, and we propose the SincHSRNN, a hybrid model that combines learnable acoustic bandpass filters with hierarchical temporal processing. The SincHSRNN achieves up to 95% accuracy on the AudioMNIST dataset while remaining compatible with passive acoustic components. Beyond computational performance, the learned parameters correspond to measurable material and geometric properties such as attenuation and transmission. Our results establish general design principles for physically realizable acoustic neural networks and outline a pathway toward low-power, wave-based neural computing.
