Audio Fingerprinting with Holographic Reduced Representations
Yusuke Fujita, Tatsuya Komatsu
TL;DR
The paper addresses the storage and time-resolution burden of neural audio fingerprinting by proposing holographic reduced representations (HRR) to coherently aggregate sequences of fingerprints into composite vectors bound by circular convolution. By encoding position information with distinct vectors and summing the bindings, the method enables containment searches and position recovery without storing every fingerprint, and it supports sequence-level search via concatenated queries. Empirical results on the FMA dataset show that HRR-based aggregation achieves higher Top-1 accuracy than simple decimation or summation, while preserving time resolution, albeit with modest degradation compared to uncompressed fingerprints; HRR-aware training yields only marginal gains. Overall, the approach offers a storage-efficient alternative for large-scale audio fingerprinting with practical implications for real-time search and scalable databases, with avenues for improved training and non-linear aggregation explored for future work.
Abstract
This paper proposes an audio fingerprinting model with holographic reduced representation (HRR). The proposed method reduces the number of stored fingerprints, whereas conventional neural audio fingerprinting requires many fingerprints for each audio track to achieve high accuracy and time resolution. We utilize HRR to aggregate multiple fingerprints into a composite fingerprint via circular convolution and summation, resulting in fewer fingerprints with the same dimensional space as the original. Our search method efficiently finds a combined fingerprint in which a query fingerprint exists. Using HRR's inverse operation, it can recover the relative position within a combined fingerprint, retaining the original time resolution. Experiments show that our method can reduce the number of fingerprints with modest accuracy degradation while maintaining the time resolution, outperforming simple decimation and summation-based aggregation methods.
