Table of Contents
Fetching ...

Acoustic neural networks: Identifying design principles and exploring physical feasibility

Ivan Kalthoff, Marcel Rey, Raphael Wittkowski

TL;DR

This work proposes a digital-twin framework for acoustic neural networks that constrains computation to passive, non-negative acoustic operations, with weights in $[0,1]$ and activations compatible with intensity-based signals. It develops three architectures—RNNs, hierarchical subsampling RNNs (HSRNNs), and SincHSRNNs with learnable sinc front ends—and evaluates them on AudioMNIST, demonstrating that a physically realizable SincHSRNN can reach up to approximately $95\%$ test accuracy at 8 kHz while preserving interpretability through bandpass filters. The study reveals general design principles, including the benefits of non-negativity as regularization, the necessity of hierarchical temporal integration, and the value of frequency-selective preprocessing for acoustic implementations. By mapping learned parameters to measurable attenuation and transmission, the work outlines a concrete pathway toward low-power, wave-based neural computing and direct analog acoustic processing at the edge.

Abstract

Wave-guide-based physical systems provide a promising route toward energy-efficient analog computing beyond traditional electronics. Within this landscape, acoustic neural networks represent a promising approach for achieving low-power computation in environments where electronics are inefficient or limited, yet their systematic design has remained largely unexplored. Here we introduce a framework for designing and simulating acoustic neural networks, which perform computation through the propagation of sound waves. Using a digital-twin approach, we train conventional neural network architectures under physically motivated constraints including non-negative signals and weights, the absence of bias terms, and nonlinearities compatible with intensity-based, non-negative acoustic signals. Our work provides a general framework for acoustic neural networks that connects learnable network components directly to physically measurable acoustic properties, enabling the systematic design of realizable acoustic computing systems. We demonstrate that constrained recurrent and hierarchical architectures can perform accurate speech classification, and we propose the SincHSRNN, a hybrid model that combines learnable acoustic bandpass filters with hierarchical temporal processing. The SincHSRNN achieves up to 95% accuracy on the AudioMNIST dataset while remaining compatible with passive acoustic components. Beyond computational performance, the learned parameters correspond to measurable material and geometric properties such as attenuation and transmission. Our results establish general design principles for physically realizable acoustic neural networks and outline a pathway toward low-power, wave-based neural computing.

Acoustic neural networks: Identifying design principles and exploring physical feasibility

TL;DR

This work proposes a digital-twin framework for acoustic neural networks that constrains computation to passive, non-negative acoustic operations, with weights in and activations compatible with intensity-based signals. It develops three architectures—RNNs, hierarchical subsampling RNNs (HSRNNs), and SincHSRNNs with learnable sinc front ends—and evaluates them on AudioMNIST, demonstrating that a physically realizable SincHSRNN can reach up to approximately test accuracy at 8 kHz while preserving interpretability through bandpass filters. The study reveals general design principles, including the benefits of non-negativity as regularization, the necessity of hierarchical temporal integration, and the value of frequency-selective preprocessing for acoustic implementations. By mapping learned parameters to measurable attenuation and transmission, the work outlines a concrete pathway toward low-power, wave-based neural computing and direct analog acoustic processing at the edge.

Abstract

Wave-guide-based physical systems provide a promising route toward energy-efficient analog computing beyond traditional electronics. Within this landscape, acoustic neural networks represent a promising approach for achieving low-power computation in environments where electronics are inefficient or limited, yet their systematic design has remained largely unexplored. Here we introduce a framework for designing and simulating acoustic neural networks, which perform computation through the propagation of sound waves. Using a digital-twin approach, we train conventional neural network architectures under physically motivated constraints including non-negative signals and weights, the absence of bias terms, and nonlinearities compatible with intensity-based, non-negative acoustic signals. Our work provides a general framework for acoustic neural networks that connects learnable network components directly to physically measurable acoustic properties, enabling the systematic design of realizable acoustic computing systems. We demonstrate that constrained recurrent and hierarchical architectures can perform accurate speech classification, and we propose the SincHSRNN, a hybrid model that combines learnable acoustic bandpass filters with hierarchical temporal processing. The SincHSRNN achieves up to 95% accuracy on the AudioMNIST dataset while remaining compatible with passive acoustic components. Beyond computational performance, the learned parameters correspond to measurable material and geometric properties such as attenuation and transmission. Our results establish general design principles for physically realizable acoustic neural networks and outline a pathway toward low-power, wave-based neural computing.

Paper Structure

This paper contains 18 sections, 1 equation, 4 figures, 8 tables.

Figures (4)

  • Figure 1: Conceptual schematic of an acoustic neural network. Information is transmitted and processed using sound waves instead of electrical signals. Propagation paths act as attenuating connections with transmission coefficients corresponding to neural weights ($w \in [0,1]$). Nonlinear transformations arise from intensity-dependent attenuation within acoustic media, serving as the activation function. Summation of signals occurs through the superposition of converging waves.
  • Figure 2: Raw waveform input is processed by a front end of five learnable sinc filters that act as interpretable bandpass elements. The filtered signals are passed to a HSRNN composed of three to four recurrent layers, labeled RNN1--RNN3 in the figure, each preceded by temporal downsampling by a factor of 8 with learnable weights. The resulting feature representation is fed to two fully connected layers, denoted Dense in the figure, with $\tanh$ activations and a final fully connected output layer followed by a softmax classifier that produces the digit-class probabilities.
  • Figure 3: Test accuracy (%) of the constrained HSRNN as a function of the upper bound $c$ of the uniform weight initialization distribution $\mathcal{U}(0, c)$. Results are shown for the 8-16-32 architecture over three independent runs. Stable training is observed only for intermediate initialization scales ($0.01 \leq c \leq 0.06$), illustrating the high sensitivity of constrained models to weight initialization.
  • Figure 4: (a), (b) Test accuracy heatmaps and (c), (d) confusion matrices for (a), (c) constrained and (b), (d) unconstrained SincHSRNNs on the full ten-digit AudioMNIST dataset. (a), (b) Test accuracy heatmaps show the mean test accuracies (%) across hidden-unit configurations and sampling rates for the constrained and unconstrained networks, respectively. Overall performance increases with sampling rate, and the gap between constrained and unconstrained models narrows at higher capacities. (c), (d) Confusion matrices for the best-performing architecture (8-16-32-64) evaluated at a sampling rate of 8 kHz, illustrating that both models achieve high per-class accuracy with only minor deviations.