Table of Contents
Fetching ...

F-StrIPE: Fast Structure-Informed Positional Encoding for Symbolic Music Generation

Manvi Agarwal, Changhong Wang, Gael Richard

TL;DR

This work tackles the challenge of long-range coherence in symbolic music generation by introducing F-StrIPE, a fast structure-informed positional encoding that operates with linear complexity using kernel-based Random Fourier Features. By replacing standard time indices with multi-resolution structural labels and unifying structure-aware sinusoidal features with RFF, the method generalizes Stochastic Positional Encoding (SPE) to incorporate musical priors without incurring the usual quadratic cost. Empirical results on melody harmonization demonstrate that structure-informed PEs, especially at the chord level, substantially improve metrics of musicality (CS, GS, NDD) over NoPE and SPE baselines, while maintaining efficiency. The findings highlight the practical impact of combining domain priors with kernel-based attention for scalable, high-quality symbolic music generation, and point to future work on richer structures and mixed-approximation strategies.

Abstract

While music remains a challenging domain for generative models like Transformers, recent progress has been made by exploiting suitable musically-informed priors. One technique to leverage information about musical structure in Transformers is inserting such knowledge into the positional encoding (PE) module. However, Transformers carry a quadratic cost in sequence length. In this paper, we propose F-StrIPE, a structure-informed PE scheme that works in linear complexity. Using existing kernel approximation techniques based on random features, we show that F-StrIPE is a generalization of Stochastic Positional Encoding (SPE). We illustrate the empirical merits of F-StrIPE using melody harmonization for symbolic music.

F-StrIPE: Fast Structure-Informed Positional Encoding for Symbolic Music Generation

TL;DR

This work tackles the challenge of long-range coherence in symbolic music generation by introducing F-StrIPE, a fast structure-informed positional encoding that operates with linear complexity using kernel-based Random Fourier Features. By replacing standard time indices with multi-resolution structural labels and unifying structure-aware sinusoidal features with RFF, the method generalizes Stochastic Positional Encoding (SPE) to incorporate musical priors without incurring the usual quadratic cost. Empirical results on melody harmonization demonstrate that structure-informed PEs, especially at the chord level, substantially improve metrics of musicality (CS, GS, NDD) over NoPE and SPE baselines, while maintaining efficiency. The findings highlight the practical impact of combining domain priors with kernel-based attention for scalable, high-quality symbolic music generation, and point to future work on richer structures and mixed-approximation strategies.

Abstract

While music remains a challenging domain for generative models like Transformers, recent progress has been made by exploiting suitable musically-informed priors. One technique to leverage information about musical structure in Transformers is inserting such knowledge into the positional encoding (PE) module. However, Transformers carry a quadratic cost in sequence length. In this paper, we propose F-StrIPE, a structure-informed PE scheme that works in linear complexity. Using existing kernel approximation techniques based on random features, we show that F-StrIPE is a generalization of Stochastic Positional Encoding (SPE). We illustrate the empirical merits of F-StrIPE using melody harmonization for symbolic music.

Paper Structure

This paper contains 22 sections, 9 equations, 2 figures, 2 tables.

Figures (2)

  • Figure 1: A schematic showing our main contributions (best viewed in colour).
  • Figure 2: A visual representation of the connection between Stochastic Positional Encoding and Random Fourier Features (best viewed in colour). The variables referenced here are detailed in Sections \ref{['section:background']} and \ref{['section:methods']}.