Table of Contents
Fetching ...

LZMidi: Compression-Based Symbolic Music Generation

Connor Ding, Abhiram Gorle, Sagnik Bhattacharya, Divija Hasteer, Naomi Sagan, Tsachy Weissman

TL;DR

LZMidi tackles the scalability challenge of symbolic music generation by deploying a LZ78-based sequential probability assignment (SPA) that builds a prefix tree from training data to predict next symbols without heavy neural parameters. It offers universal convergence guarantees, showing that the SPA probabilities increasingly approximate the true data distribution as more training sequences are observed. Empirically, LZMidi on the Lakh MIDI Dataset achieves competitive perceptual quality (via FAD, WD, and KL) while dramatically reducing training and generation costs on CPUs compared to discrete diffusion baselines like ASD3PM. The approach demonstrates a compelling, resource-efficient alternative for symbolic music generation with solid theoretical underpinnings, and points to future work on longer, polyphonic sequences and broader baselines. Overall, the work highlights compression-based learning as a practical path to scalable, interpretable sequence generation in music.

Abstract

Recent advances in symbolic music generation primarily rely on deep learning models such as Transformers, GANs, and diffusion models. While these approaches achieve high-quality results, they require substantial computational resources, limiting their scalability. We introduce LZMidi, a lightweight symbolic music generation framework based on a Lempel-Ziv (LZ78)-induced sequential probability assignment (SPA). By leveraging the discrete and sequential structure of MIDI data, our approach enables efficient music generation on standard CPUs with minimal training and inference costs. Theoretically, we establish universal convergence guarantees for our approach, underscoring its reliability and robustness. Compared to state-of-the-art diffusion models, LZMidi achieves competitive Frechet Audio Distance (FAD), Wasserstein Distance (WD), and Kullback-Leibler (KL) scores, while significantly reducing computational overhead - up to 30x faster training and 300x faster generation. Our results position LZMidi as a significant advancement in compression-based learning, highlighting how universal compression techniques can efficiently model and generate structured sequential data, such as symbolic music, with practical scalability and theoretical rigor.

LZMidi: Compression-Based Symbolic Music Generation

TL;DR

LZMidi tackles the scalability challenge of symbolic music generation by deploying a LZ78-based sequential probability assignment (SPA) that builds a prefix tree from training data to predict next symbols without heavy neural parameters. It offers universal convergence guarantees, showing that the SPA probabilities increasingly approximate the true data distribution as more training sequences are observed. Empirically, LZMidi on the Lakh MIDI Dataset achieves competitive perceptual quality (via FAD, WD, and KL) while dramatically reducing training and generation costs on CPUs compared to discrete diffusion baselines like ASD3PM. The approach demonstrates a compelling, resource-efficient alternative for symbolic music generation with solid theoretical underpinnings, and points to future work on longer, polyphonic sequences and broader baselines. Overall, the work highlights compression-based learning as a practical path to scalable, interpretable sequence generation in music.

Abstract

Recent advances in symbolic music generation primarily rely on deep learning models such as Transformers, GANs, and diffusion models. While these approaches achieve high-quality results, they require substantial computational resources, limiting their scalability. We introduce LZMidi, a lightweight symbolic music generation framework based on a Lempel-Ziv (LZ78)-induced sequential probability assignment (SPA). By leveraging the discrete and sequential structure of MIDI data, our approach enables efficient music generation on standard CPUs with minimal training and inference costs. Theoretically, we establish universal convergence guarantees for our approach, underscoring its reliability and robustness. Compared to state-of-the-art diffusion models, LZMidi achieves competitive Frechet Audio Distance (FAD), Wasserstein Distance (WD), and Kullback-Leibler (KL) scores, while significantly reducing computational overhead - up to 30x faster training and 300x faster generation. Our results position LZMidi as a significant advancement in compression-based learning, highlighting how universal compression techniques can efficiently model and generate structured sequential data, such as symbolic music, with practical scalability and theoretical rigor.

Paper Structure

This paper contains 23 sections, 4 theorems, 22 equations, 5 figures, 5 tables.

Key Result

Theorem 3.1

Let $P$ be the law of a process with components taking values in a finite alphabet $\mathcal{X}$, and let $Q^m$ be the LZ78-based sequential probability assignment (SPA) constructed using $m$ i.i.d training sequences from $P_{X^n}$. Then, for any fixed $n$, where $D(\cdot\|\cdot)$ denotes the Kullback--Leibler divergence.

Figures (5)

  • Figure 1: Sample Midi File
  • Figure 2: Data Distribution including 0 & 1
  • Figure 3: Data Distribution without 0 & 1
  • Figure 4: MIDI plots for Generated Samples using the LZMidi Model.
  • Figure 5: MIDI plots for Generated Samples using the D3PM Model.

Theorems & Definitions (11)

  • Theorem 3.1: Universal Convergence of LZ78-SPA
  • Remark 3.2
  • Remark 6.1
  • Definition 2
  • Remark 3
  • Definition 4: Symbol counts
  • Theorem 5
  • proof
  • proof
  • Corollary 7
  • ...and 1 more