Table of Contents
Fetching ...

Mode-conditioned music learning and composition: a spiking neural network inspired by neuroscience and psychology

Qian Liang, Yi Zeng, Menghaoran Tang

TL;DR

This work addresses the gap between AI-generated music and human cognitive understanding by proposing a brain-inspired spiking neural network that learns Western modes and keys and generates four-part harmony conditioned on mode and key. The architecture combines a Music Theory Subsystem with a Sequential Memory Subsystem, using neural circuit evolution and STDP-based learning to form cross-subsystem connections and encode tonal Hierarchies similar to the Krumhansl-Schmuckler model. The approach demonstrates close alignment with KS profiles and yields four-part compositions whose tonal characteristics and melodic adaptability match the conditioning keys and modes, validated against SHTE and Bach datasets. By integrating neuroscience, psychology, and music theory within a spiking framework, the method offers a cognitively grounded pathway for expressive, tonally coherent music generation with potential implications for harmonic learning and emotion-aware composition.

Abstract

Musical mode is one of the most critical element that establishes the framework of pitch organization and determines the harmonic relationships. Previous works often use the simplistic and rigid alignment method, and overlook the diversity of modes. However, in contrast to AI models, humans possess cognitive mechanisms for perceiving the various modes and keys. In this paper, we propose a spiking neural network inspired by brain mechanisms and psychological theories to represent musical modes and keys, ultimately generating musical pieces that incorporate tonality features. Specifically, the contributions are detailed as follows: 1) The model is designed with multiple collaborated subsystems inspired by the structures and functions of corresponding brain regions; 2)We incorporate mechanisms for neural circuit evolutionary learning that enable the network to learn and generate mode-related features in music, reflecting the cognitive processes involved in human music perception. 3)The results demonstrate that the proposed model shows a connection framework closely similar to the Krumhansl-Schmuckler model, which is one of the most significant key perception models in the music psychology domain. 4) Experiments show that the model can generate music pieces with characteristics of the given modes and keys. Additionally, the quantitative assessments of generated pieces reveals that the generating music pieces have both tonality characteristics and the melodic adaptability needed to generate diverse and musical content. By combining insights from neuroscience, psychology, and music theory with advanced neural network architectures, our research aims to create a system that not only learns and generates music but also bridges the gap between human cognition and artificial intelligence.

Mode-conditioned music learning and composition: a spiking neural network inspired by neuroscience and psychology

TL;DR

This work addresses the gap between AI-generated music and human cognitive understanding by proposing a brain-inspired spiking neural network that learns Western modes and keys and generates four-part harmony conditioned on mode and key. The architecture combines a Music Theory Subsystem with a Sequential Memory Subsystem, using neural circuit evolution and STDP-based learning to form cross-subsystem connections and encode tonal Hierarchies similar to the Krumhansl-Schmuckler model. The approach demonstrates close alignment with KS profiles and yields four-part compositions whose tonal characteristics and melodic adaptability match the conditioning keys and modes, validated against SHTE and Bach datasets. By integrating neuroscience, psychology, and music theory within a spiking framework, the method offers a cognitively grounded pathway for expressive, tonally coherent music generation with potential implications for harmonic learning and emotion-aware composition.

Abstract

Musical mode is one of the most critical element that establishes the framework of pitch organization and determines the harmonic relationships. Previous works often use the simplistic and rigid alignment method, and overlook the diversity of modes. However, in contrast to AI models, humans possess cognitive mechanisms for perceiving the various modes and keys. In this paper, we propose a spiking neural network inspired by brain mechanisms and psychological theories to represent musical modes and keys, ultimately generating musical pieces that incorporate tonality features. Specifically, the contributions are detailed as follows: 1) The model is designed with multiple collaborated subsystems inspired by the structures and functions of corresponding brain regions; 2)We incorporate mechanisms for neural circuit evolutionary learning that enable the network to learn and generate mode-related features in music, reflecting the cognitive processes involved in human music perception. 3)The results demonstrate that the proposed model shows a connection framework closely similar to the Krumhansl-Schmuckler model, which is one of the most significant key perception models in the music psychology domain. 4) Experiments show that the model can generate music pieces with characteristics of the given modes and keys. Additionally, the quantitative assessments of generated pieces reveals that the generating music pieces have both tonality characteristics and the melodic adaptability needed to generate diverse and musical content. By combining insights from neuroscience, psychology, and music theory with advanced neural network architectures, our research aims to create a system that not only learns and generates music but also bridges the gap between human cognition and artificial intelligence.

Paper Structure

This paper contains 30 sections, 8 equations, 9 figures, 3 tables.

Figures (9)

  • Figure 1: The architecture of the brain-inspired mode learning spiking neural network. The model contains the music theory subsystem(MTS) and the sequential memory subsystem(SMS). The MTS contains the mode cluster and key clusters, which is responsible for encoding the related music knowledge. The SMS receives the symbolic representation of the pitches and the durations, encoding and memorizing the relationships of the ordered notes. The structure of the pitch and the duration subnewtorks are marked by the dashed circles.
  • Figure 2: The musical theory subsystem, (A) describes how a neuron group in first layer in the mode cluster represents the major mode. Seven neurons drawn by orange parallelograms encodes the diatonic tones and the gray ones encodes the chromatic tones; (B) employs the key of G major to illustrate the principle, neurons drawn by green parallelograms encodes diatonic tones, G, A, B, C, D, E, #F, the gray parallelograms also encodes the rest tones; (C) draws the hierarchical connection architecture of the mode cluster.
  • Figure 3: The learning process of music guided by the mode theory.
  • Figure 4: The generation process of a musical piece in G minor begins with a tonic chord as the seed. The model receives this seed input and activates the neurons representing pitches that distribute in four parts, guided by the active neurons that represent G minor, step by step. Neurons and synaptic connections not involved in the generation are omitted for clarity. Green circles represent neurons within the G minor group of the key cluster, blue and orange circles denote the pitch and duration neurons in the sequential memory subsystem, and red circles mark neurons activated at different time steps.
  • Figure 5: The comparative analysis between the connection architecture of our model and the Krumhansl-Schmuckler model is presented as follows: Panel (A) shows the average synaptic weight and synaptic count of neurons within the major cluster and pitch subnetwork trained on the SHTE dataset, while panel (B) illustrates these metrics for the minor cluster and the pitch subnetwork, also trained on SHTE. Additionally, panels (C) and (D) display the same metrics for both major and minor clusters and pitch subnetworks trained on the Bach corpus.
  • ...and 4 more figures