Segment-Factorized Full-Song Generation on Symbolic Piano Music
Ping-Yi Chen, Chih-Pin Tan, Yi-Hsuan Yang
TL;DR
The paper addresses full-song symbolic piano generation by balancing long-range structural coherence with local motif development. It introduces Segmented Full-Song Generation (SFS), which decomposes a song into segments and uses a Transformer-based generator conditioned on four contextual sources (Left, Right, Seed, Ref) plus a global summary encoder $G$, coupled with frame-based tokenization and specialized positional encodings. The key contributions are a factorized joint probability framework, selective attention to context segments, demonstrated improvements in seed adherence and structural coherence, and a real-time generation capability that enables interactive human–AI composition, along with open-source code, weights, and a web interface. The work has practical impact for interactive music creation and scalable full-song generation, offering a path toward more natural human–AI collaboration in symbolic music.
Abstract
We propose the Segmented Full-Song Model (SFS) for symbolic full-song generation. The model accepts a user-provided song structure and an optional short seed segment that anchors the main idea around which the song is developed. By factorizing a song into segments and generating each one through selective attention to related segments, the model achieves higher quality and efficiency compared to prior work. To demonstrate its suitability for human-AI interaction, we further wrap SFS into a web application that enables users to iteratively co-create music on a piano roll with customizable structures and flexible ordering.
