In-depth analysis of music structure as a text network

Ping-Rui Tsai; Yen-Ting Chou; Nathan-Christopher Wang; Hui-Ling Chen; Hong-Yue Huang; Zih-Jia Luo; Tzay-Ming Hong

In-depth analysis of music structure as a text network

Ping-Rui Tsai, Yen-Ting Chou, Nathan-Christopher Wang, Hui-Ling Chen, Hong-Yue Huang, Zih-Jia Luo, Tzay-Ming Hong

TL;DR

This work treats music as a natural-language-like system and builds an Essential Element Network (EEN) by decomposing musical content into Essential Elements (EE) and defining CC-based words, enforcing Zipf-like statistics to compare CPP periods. It introduces a linking rule with $I$, a threshold $I_m$, and a CC-based word construction, then maps audio to note-time-space and optimizes weights $(w_1,...,w_4)$ from 4032 configurations to fit Zipf's law and maximize word-type diversity. Through 2D EEN representations and CNN-based classification, the paper shows period-specific weight profiles (e.g., high $w_3$ for Baroque) and analyzes CC trends, histograms, and robustness to word removal, achieving music-vs-non-music discrimination. The framework links music structure to natural language processing and knowledge graphs, enabling quantitative analysis, potential music generation via GANs, and cross-disciplinary insights in anthropology and cognition.

Abstract

Music, enchanting and poetic, permeates every corner of human civilization. Although music is not unfamiliar to people, our understanding of its essence remains limited, and there is still no universally accepted scientific description. This is primarily due to music being regarded as a product of both reason and emotion, making it difficult to define. In this article, we focus on the fundamental elements of music and construct an evolutionary network from the perspective of music as a natural language, aligning with the statistical characteristics of texts. Through this approach, we aim to comprehend the structural differences in music across different periods, enabling a more scientific exploration of music. Relying on the advantages of structuralism, we can concentrate on the relationships and order between the physical elements of music, rather than getting entangled in the blurred boundaries of science and philosophy. The scientific framework we present not only conforms to past conclusions in music, but also serves as a bridge that connects music to natural language processing and knowledge graphs.

In-depth analysis of music structure as a text network

TL;DR

, a threshold

, and a CC-based word construction, then maps audio to note-time-space and optimizes weights

from 4032 configurations to fit Zipf's law and maximize word-type diversity. Through 2D EEN representations and CNN-based classification, the paper shows period-specific weight profiles (e.g., high

for Baroque) and analyzes CC trends, histograms, and robustness to word removal, achieving music-vs-non-music discrimination. The framework links music structure to natural language processing and knowledge graphs, enabling quantitative analysis, potential music generation via GANs, and cross-disciplinary insights in anthropology and cognition.

Abstract

Paper Structure (10 sections, 2 equations, 8 figures)

This paper contains 10 sections, 2 equations, 8 figures.

Introduction
network modeling
Distinguishing Different Musical Periods
Criterion: Weights
Criterion: Trend and Histogram of Words
Deep Learning analyses
Training Information
Minimum Features and Robustness of Text Networks
Difference Between Music and Non-music and the Evolution of Words
Conclusion and Discussions

Figures (8)

Figure 1: This illustration depicts the three main periods we primarily discuss in the article: the Baroque, Classical, and Romantic periods, each represented by a representative musician. Additionally, we visually present the overall processes of the textualization of music in EE.
Figure 2: Volumes are stated in the note vs. time plot for the music score by Ryuichi Sakamoto. The exchange of information between two pixels is determined by their elements and weights. Using Eq. (1), we calculate the linking condition for the text network of the music and define the link between orange and yellow dots. The CCs in the network are then treated as words.
Figure 3: (a) A t-SNE mapping of the four weights and threshold value onto the eigenspace. The dash line is to highlight the existence of two clusters, as indicated by the statistical population in parentheses. (b) A t-test is conducted to assess the statistical significance of the weight selection where the orange dotted line is for p-value on the right y-axis. (c) This full logarithmic plot for the Zipf distribution in different periods. (d) The Zipf distribution of different types of sounds under the range of weight selection, We use the initial letter to represent a line, in which ambient sound includes bird, river, and city traffic.
Figure 4: The CC is plotted against sequence. Two samples of EEN distributions in one dimension are shown in (a, b), both of which turn out to be periodic. (c) Compared to the Baroque period and ambient sound, the variation of CC in the Romantic period is more pronounced which is in line with the conventional view in musical history that its musical form is more diverse. (d) The distribution of different types of audios where the music samples include 38 piano songs with a length of less than two minutes
Figure 5: The histogram of Romantic, Beethoven and Baroque period, and the ambient sounds is shown respectively in (a), (b), (c), and (d). The Romantic period is the only one that defies the normal or Gaussian distribution at exhibiting multiple peaks.
...and 3 more figures

In-depth analysis of music structure as a text network

TL;DR

Abstract

In-depth analysis of music structure as a text network

Authors

TL;DR

Abstract

Table of Contents

Figures (8)