Mode-conditioned music learning and composition: a spiking neural network inspired by neuroscience and psychology
Qian Liang, Yi Zeng, Menghaoran Tang
TL;DR
This work addresses the gap between AI-generated music and human cognitive understanding by proposing a brain-inspired spiking neural network that learns Western modes and keys and generates four-part harmony conditioned on mode and key. The architecture combines a Music Theory Subsystem with a Sequential Memory Subsystem, using neural circuit evolution and STDP-based learning to form cross-subsystem connections and encode tonal Hierarchies similar to the Krumhansl-Schmuckler model. The approach demonstrates close alignment with KS profiles and yields four-part compositions whose tonal characteristics and melodic adaptability match the conditioning keys and modes, validated against SHTE and Bach datasets. By integrating neuroscience, psychology, and music theory within a spiking framework, the method offers a cognitively grounded pathway for expressive, tonally coherent music generation with potential implications for harmonic learning and emotion-aware composition.
Abstract
Musical mode is one of the most critical element that establishes the framework of pitch organization and determines the harmonic relationships. Previous works often use the simplistic and rigid alignment method, and overlook the diversity of modes. However, in contrast to AI models, humans possess cognitive mechanisms for perceiving the various modes and keys. In this paper, we propose a spiking neural network inspired by brain mechanisms and psychological theories to represent musical modes and keys, ultimately generating musical pieces that incorporate tonality features. Specifically, the contributions are detailed as follows: 1) The model is designed with multiple collaborated subsystems inspired by the structures and functions of corresponding brain regions; 2)We incorporate mechanisms for neural circuit evolutionary learning that enable the network to learn and generate mode-related features in music, reflecting the cognitive processes involved in human music perception. 3)The results demonstrate that the proposed model shows a connection framework closely similar to the Krumhansl-Schmuckler model, which is one of the most significant key perception models in the music psychology domain. 4) Experiments show that the model can generate music pieces with characteristics of the given modes and keys. Additionally, the quantitative assessments of generated pieces reveals that the generating music pieces have both tonality characteristics and the melodic adaptability needed to generate diverse and musical content. By combining insights from neuroscience, psychology, and music theory with advanced neural network architectures, our research aims to create a system that not only learns and generates music but also bridges the gap between human cognition and artificial intelligence.
