Spiking Music: Audio Compression with Event Based Auto-encoders
Martim Lisboa, Guillaume Bellec
TL;DR
This work investigates whether audio compression can benefit from event-based representations inspired by neural spikes. It introduces Spiking Music, an end-to-end binary auto-encoder whose latent is a binary matrix $z \in \{0,1\}^{N,T_z}$, replacing vector quantization with a differentiable binary quantizer and enabling both dense and sparse storage regimes. The paper demonstrates competitive reconstruction at about $3$ kbps in the dense setting and reaches $2.59$ kbps in the sparse regime through a sparsity-driven training schedule, with additional gains from a controllable $\mu$-SPARSE variant that targets bitrate while maintaining quality. A key finding is that, in the sparse regime, the latent units become selective and synchronized with piano note onsets, signaling that the event-based code captures high-level musical structure with potential energy-efficiency benefits for future hardware implementations.
Abstract
Neurons in the brain communicate information via punctual events called spikes. The timing of spikes is thought to carry rich information, but it is not clear how to leverage this in digital systems. We demonstrate that event-based encoding is efficient for audio compression. To build this event-based representation we use a deep binary auto-encoder, and under high sparsity pressure, the model enters a regime where the binary event matrix is stored more efficiently with sparse matrix storage algorithms. We test this on the large MAESTRO dataset of piano recordings against vector quantized auto-encoders. Not only does our "Spiking Music compression" algorithm achieve a competitive compression/reconstruction trade-off, but selectivity and synchrony between encoded events and piano key strikes emerge without supervision in the sparse regime.
