Table of Contents
Fetching ...

Local deployment of large-scale music AI models on commodity hardware

Xun Zhou, Charlie Ruan, Zihe Zhao, Tianqi Chen, Chris Donahue

TL;DR

The MIDInfinite, a web application capable of generating symbolic music using a large-scale generative AI model locally on commodity hardware, and envision that MLC has the potential to bridge the gap between the landscape of increasingly capable music AI models and technology more familiar to music software developers.

Abstract

We present the MIDInfinite, a web application capable of generating symbolic music using a large-scale generative AI model locally on commodity hardware. Creating this demo involved porting the Anticipatory Music Transformer, a large language model (LLM) pre-trained on the Lakh MIDI dataset, to the Machine Learning Compilation (MLC) framework. Once the model is ported, MLC facilitates inference on a variety of runtimes including C++, mobile, and the browser. We envision that MLC has the potential to bridge the gap between the landscape of increasingly capable music AI models and technology more familiar to music software developers. As a proof of concept, we build a web application that allows users to generate endless streams of multi-instrumental MIDI in the browser, either from scratch or conditioned on a prompt. On commodity hardware (an M3 Macbook Pro), our demo can generate 51 notes per second, which is faster than real-time playback for 72.9% of generations, and increases to 86.3% with 2 seconds of upfront buffering.

Local deployment of large-scale music AI models on commodity hardware

TL;DR

The MIDInfinite, a web application capable of generating symbolic music using a large-scale generative AI model locally on commodity hardware, and envision that MLC has the potential to bridge the gap between the landscape of increasingly capable music AI models and technology more familiar to music software developers.

Abstract

We present the MIDInfinite, a web application capable of generating symbolic music using a large-scale generative AI model locally on commodity hardware. Creating this demo involved porting the Anticipatory Music Transformer, a large language model (LLM) pre-trained on the Lakh MIDI dataset, to the Machine Learning Compilation (MLC) framework. Once the model is ported, MLC facilitates inference on a variety of runtimes including C++, mobile, and the browser. We envision that MLC has the potential to bridge the gap between the landscape of increasingly capable music AI models and technology more familiar to music software developers. As a proof of concept, we build a web application that allows users to generate endless streams of multi-instrumental MIDI in the browser, either from scratch or conditioned on a prompt. On commodity hardware (an M3 Macbook Pro), our demo can generate 51 notes per second, which is faster than real-time playback for 72.9% of generations, and increases to 86.3% with 2 seconds of upfront buffering.

Paper Structure

This paper contains 4 sections, 4 figures, 1 table.

Figures (4)

  • Figure 1: Proposed workflow for bridging the gap between large model and music software ecosystems, which involves contributions from both model providers (porting models to MLC) and music software developers (building applications on resultant runtimes). Here we build a web demo of a state-of-the-art symbolic music generation model as a proof-of-concept.
  • Figure 1: Profiling different models, runtimes, and commodity chips. Streamable is % of time where time in generation stream exceeds time in playback stream, with and without an initial $2$s playback buffer.
  • Figure 2: Streaming performance visualization for small model. To stream, chips (dashed lines) must exceed the tok/s of music playback (solid line), which varies based on generated note density (shaded $\pm$1 stdev.).
  • Figure :