Table of Contents
Fetching ...

Acoustic Overspecification in Electronic Dance Music Taxonomy

Weilun Xu, Tianhao Dai, Oscar Goudet, Xiaoxuan Wang

Abstract

Electronic Dance Music (EDM) classification typically relies on industry-defined taxonomies, with current supervised approaches naturally assuming the validity of prescribed subgenre labels. However, whether these commercial distinctions reflect genuine acoustic differences remains largely unexplored. In this paper, we propose an unsupervised approach to discover the natural acoustic structure of EDM independent of commercial labels. To address the historical lack of EDM-specific feature design in MIR, we systematically construct a tailored, interpretable acoustic feature space capturing the genre's defining production techniques, spectral textures, and layered rhythmic patterns. To ensure our findings reflect inherent acoustic structure rather than feature engineering artifacts, we validate our clustering against state-of-the-art pre-trained audio embeddings (MERT and CLAP). Across both our bespoke feature space and the pre-trained embeddings, clustering consistently identifies 20 or fewer natural acoustic families -- suggesting current commercial EDM taxonomy is acoustically overspecified by nearly one-half.

Acoustic Overspecification in Electronic Dance Music Taxonomy

Abstract

Electronic Dance Music (EDM) classification typically relies on industry-defined taxonomies, with current supervised approaches naturally assuming the validity of prescribed subgenre labels. However, whether these commercial distinctions reflect genuine acoustic differences remains largely unexplored. In this paper, we propose an unsupervised approach to discover the natural acoustic structure of EDM independent of commercial labels. To address the historical lack of EDM-specific feature design in MIR, we systematically construct a tailored, interpretable acoustic feature space capturing the genre's defining production techniques, spectral textures, and layered rhythmic patterns. To ensure our findings reflect inherent acoustic structure rather than feature engineering artifacts, we validate our clustering against state-of-the-art pre-trained audio embeddings (MERT and CLAP). Across both our bespoke feature space and the pre-trained embeddings, clustering consistently identifies 20 or fewer natural acoustic families -- suggesting current commercial EDM taxonomy is acoustically overspecified by nearly one-half.

Paper Structure

This paper contains 16 sections, 4 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: t-SNE projection of selected acoustic features (Section \ref{['ssec:features']}). Left: K-means clusters at the optimal $k$ identified in Section \ref{['ssec:natural']}. Right: identical projection colored by the 35 commercial Beatport genre labels. The right panel's color fragmentation within spatially coherent regions illustrates the overspecification of industry taxonomy relative to acoustic structure.
  • Figure 2: Feature importance analysis. (a) Top 20 features ranked by supervised ensemble score. (b) Importance distributions grouped by feature category: EDM-specific (Production & Texture, Tempogram-based) versus conventional MIR descriptors.
  • Figure 3: Confusion matrix mapping 35 commercial genre labels to 35 discovered acoustic clusters. The matrix is augmented by three marginals: Cluster Purity (top) measures the acoustic exclusivity of a cluster; Genre Concentration (left) measures how acoustically monolithic a commercial genre is; and Cluster Size (bottom) indicates the total volume of tracks in that acoustic space. The clear inverse relationship between cluster size and purity highlights how massive, generic acoustic templates absorb multiple overlapping commercial labels, while smaller, pure clusters represent distinct sonic niches.
  • Figure 4: Music profiles of selected clusters, comparing the five purest (top) against the five most hybridized (bottom). Each axis summarizes one of six acoustic dimensions---Energy, Danceability, Tempo, Harmonic complexity, Rhythmic density, and Electronic texture---by z-score normalizing all constituent features within a dimension, averaging them per track, and rescaling cluster means to 0--100 via $p_5$/$p_{95}$ interpolation. Pure clusters exhibit extreme, peaked profiles; mixed clusters converge toward moderate, overlapping shapes.
  • Figure 5: Hierarchical genre lineage derived from co-clustering affinity ($1-\hat{a}_{ij}$, where $\hat{a}_{ij}$ is the normalized co-clustering frequency across 50 K-means restarts). Four acoustic families and three isolated outliers emerge.