Table of Contents
Fetching ...

Trade-offs between structural richness and communication efficiency in music network representations

Lluc Bono Rosselló, Robert Jankowski, Hugues Bersini, Marián Boguñá, M. Ángeles Serrano

TL;DR

This work investigates how feature selection in music sequence representations alters the resulting network topology and the efficiency of communicating musical structure under perceptual constraints. By constructing eight encodings from single-feature to multi-feature representations and evaluating topology, entropy rate, and perceptual divergence against a cognitively informed model, the study reveals a trade-off: compressed encodings enable efficient communication with high entropy per step, while richer encodings preserve detailed musical structure but incur higher perceptual error. A key finding is that uncertainty concentrates in dynamically central nodes, and longer pieces intensify alignment between visitation and inference, suggesting that certain structural arrangements support efficient learning and prediction. The results provide a principled framework to relate representational choice to cognitive costs in sequential processing and have implications for applying similar analyses to other sequential domains.

Abstract

Music is a structured and perceptually rich sequence of sounds in time with well-defined symbolic features, whose perception is shaped by the interplay of expectation and uncertainty. Network science offers a powerful framework for studying its structural organization and communication efficiency. However, it remains unclear how feature selection affects the properties of reconstructed networks and perceptual alignment. Here, we systematically compare eight encodings of musical sequences, ranging from single-feature descriptions to richer multi-feature combinations. We show that representational choices fundamentally shape network topology, the distribution of uncertainty, and the estimated communication efficiency under perceptual constraints. Single-feature representations compress sequences into dense transition structures that support efficient communication, yielding high entropy rates with low modeled perceptual error, but they discard structural richness. By contrast, multi-feature representations preserve descriptive detail and structural specificity, expanding the state space and producing sharper transition profiles and lower entropy rates, which leads to higher modeled perceptual error. Across representations, we found that uncertainty increasingly concentrates in nodes with higher diffusion-based centrality while their perceptual error remains low, unveiling an interplay between predictable structure and localized surprise. Together, these results show that feature choice directly shapes music network representation, describing trade-offs between descriptive richness and communication efficiency and suggesting structural conditions that may support efficient learning and prediction.

Trade-offs between structural richness and communication efficiency in music network representations

TL;DR

This work investigates how feature selection in music sequence representations alters the resulting network topology and the efficiency of communicating musical structure under perceptual constraints. By constructing eight encodings from single-feature to multi-feature representations and evaluating topology, entropy rate, and perceptual divergence against a cognitively informed model, the study reveals a trade-off: compressed encodings enable efficient communication with high entropy per step, while richer encodings preserve detailed musical structure but incur higher perceptual error. A key finding is that uncertainty concentrates in dynamically central nodes, and longer pieces intensify alignment between visitation and inference, suggesting that certain structural arrangements support efficient learning and prediction. The results provide a principled framework to relate representational choice to cognitive costs in sequential processing and have implications for applying similar analyses to other sequential domains.

Abstract

Music is a structured and perceptually rich sequence of sounds in time with well-defined symbolic features, whose perception is shaped by the interplay of expectation and uncertainty. Network science offers a powerful framework for studying its structural organization and communication efficiency. However, it remains unclear how feature selection affects the properties of reconstructed networks and perceptual alignment. Here, we systematically compare eight encodings of musical sequences, ranging from single-feature descriptions to richer multi-feature combinations. We show that representational choices fundamentally shape network topology, the distribution of uncertainty, and the estimated communication efficiency under perceptual constraints. Single-feature representations compress sequences into dense transition structures that support efficient communication, yielding high entropy rates with low modeled perceptual error, but they discard structural richness. By contrast, multi-feature representations preserve descriptive detail and structural specificity, expanding the state space and producing sharper transition profiles and lower entropy rates, which leads to higher modeled perceptual error. Across representations, we found that uncertainty increasingly concentrates in nodes with higher diffusion-based centrality while their perceptual error remains low, unveiling an interplay between predictable structure and localized surprise. Together, these results show that feature choice directly shapes music network representation, describing trade-offs between descriptive richness and communication efficiency and suggesting structural conditions that may support efficient learning and prediction.

Paper Structure

This paper contains 12 sections, 7 equations, 5 figures.

Figures (5)

  • Figure 1: Networks reconstructed from a musical piece. (a) A simple composition with highlighted interval changes. (b) Eight distinct network representations. All networks are directed and weighted: edges indicate transitions, and weights correspond to their frequencies.
  • Figure 2: Topological properties of music network models. Violin plots showing the distribution of eight network measures: (a) size $N$, (b) average degree $\langle k \rangle$, (c) degree heterogeneity $\langle k^2 \rangle/\langle k\rangle^2$, (d) average clustering coefficient $\langle C \rangle$, (e) average shortest path length $\langle l \rangle$, (f) reciprocity $r$, (g) average in-strength $\langle s_{in}\rangle$, and (h) betweenness centrality $BC$, across different models averaged over all musical pieces for the right hand. Models are grouped by the number of musical features used. Each symbol corresponds to an individual model and also highlights the median of each distribution.
  • Figure 3: Information content and perception efficiency in music network models. (a) Median entropy of weighted $S$ versus unweighted $S^{\mathrm{uw}}$ networks. (b) Median entropy $S$ versus entropy of type-A randomized networks. (c) Median Kullback--Leibler divergence of weighted $D_{KL}$ versus unweighted $D^{\mathrm{uw}}_{KL}$ networks. (d) Median $D_{KL}$ versus $D_{KL}$ of type-A randomized networks. (e) Median $D_{KL}$ versus median $S$. (f) Median $D^{\mathrm{uw}}_{KL}$ vs median $S^{\mathrm{uw}}$. Models are grouped by the number of musical features used. Each symbol corresponds to an individual model. Error bars show the interquartile range (IQR) around the median.
  • Figure 4: Global alignment of uncertainty and inference in music networks. (a) Median entropy $S$ versus mean node-level entropy $\bar{S}$. (b) Median KL divergence $D_{KL}$ versus node-averaged KL divergence $\bar{D}_{KL}$. (c) Mean node-level entropy with and without edge weights ($\bar{S}$ versus $\bar{S}^{\mathrm{uw}}$). (d) Median node-averaged KL divergence with and without edge weights ($\bar{D}_{KL}$ versus $\bar{D}^{\mathrm{uw}}_{KL}$). (d) Median entropy $S$ versus the difference between median and node-level entropy ($S - \bar{S}$). (e) Median KL divergence $D_{KL}$ vs the difference between median and node-averaged KL divergence ($D_{KL} - \bar{D}_{KL}$). In (d) and (e) data is grouped by transition count. Transition counts are divided into five equal-sized bins. Smaller and more transparent markers correspond to shorter pieces. Points show medians, and error bars indicate the interquartile range. All panels correspond to right-hand tracks. Models are grouped by the number of musical features used. Each symbol corresponds to an individual model.
  • Figure 5: Local alignment of uncertainty and inference in music networks. (a,b) Median fraction of nodes required to accumulate successive quartiles of stationary mass ($M$). (c,d) Average node entropy ($\bar{S}$) across $\pi$ quartiles.(e,f) Node-averaged KL divergence ($\bar{D}_{KL}$) across $\pi$ quartiles. Points denote medians and error bars show interquartile ranges. Columns separate short and long musical compositions, defined by dividing pieces into two equal bins by the number of transitions. All panels correspond to right-hand tracks. Models are grouped by the number of musical features used. Each symbol corresponds to an individual model.