Musical Metamerism with Time--Frequency Scattering

Vincent Lostanlen; Han Han

Musical Metamerism with Time--Frequency Scattering

Vincent Lostanlen, Han Han

TL;DR

The paper addresses understanding musical familiarity through contour-based perception and proposes a method to synthesize musical metamers from any audio using joint time--frequency scattering (JTFS). By coarsening JTFS coefficients with Gaussian averaging over time and log-frequency, metamers are invariant to temporal shifts up to $T$ and transpositions up to $F$, enabling gradient-based reconstruction from random initializations. The approach yields metamers without manual preprocessing like transcription or beat tracking, bridging cognitive science questions with advanced signal-processing representations and supporting reproducible research. Connections to spectrotemporal representations such as STRF, MPS, and GBFB are discussed, and the method is implemented in Kymatio with GPU support for practical use in music cognition experiments and beyond.

Abstract

The concept of metamerism originates from colorimetry, where it describes a sensation of visual similarity between two colored lights despite significant differences in spectral content. Likewise, we propose to call ``musical metamerism'' the sensation of auditory similarity which is elicited by two music fragments which differ in terms of underlying waveforms. In this technical report, we describe a method to generate musical metamers from any audio recording. Our method is based on joint time--frequency scattering in Kymatio, an open-source software in Python which enables GPU computing and automatic differentiation. The advantage of our method is that it does not require any manual preprocessing, such as transcription, beat tracking, or source separation. We provide a mathematical description of JTFS as well as some excerpts from the Kymatio source code. Lastly, we review the prior work on JTFS and draw connections with closely related algorithms, such as spectrotemporal receptive fields (STRF), modulation power spectra (MPS), and Gabor filterbank (GBFB).

Musical Metamerism with Time--Frequency Scattering

TL;DR

and transpositions up to

, enabling gradient-based reconstruction from random initializations. The approach yields metamers without manual preprocessing like transcription or beat tracking, bridging cognitive science questions with advanced signal-processing representations and supporting reproducible research. Connections to spectrotemporal representations such as STRF, MPS, and GBFB are discussed, and the method is implemented in Kymatio with GPU support for practical use in music cognition experiments and beyond.

Abstract

Paper Structure (19 sections, 7 equations, 1 figure)

This paper contains 19 sections, 7 equations, 1 figure.

Introduction
Problem statement
Contribution
Method
Joint time--frequency scattering
Averaging
Reconstruction
Implementation
Filterbanks
First layer
Second layer, time variable
Second layer, frequency variable
Main algorithm
Related work
Applications to audio classification
...and 4 more sections

Figures (1)

Figure 1: Interference pattern between wavelets $\boldsymbol{\psi}_\alpha (t)$ and $\boldsymbol{\psi}_\beta (\log_2 \lambda)$ in the time--frequency domain $(t, \log_2 \lambda)$ for different combinations of amplitude modulation rate $\alpha$ and frequency modulation scale $\beta$. Darker shades of red (resp. blue) indicate higher positive (resp. lower negative) values of the real part.

Musical Metamerism with Time--Frequency Scattering

TL;DR

Abstract

Musical Metamerism with Time--Frequency Scattering

Authors

TL;DR

Abstract

Table of Contents

Figures (1)