Table of Contents
Fetching ...

Interpreting Graphic Notation with MusicLDM: An AI Improvisation of Cornelius Cardew's Treatise

Tornike Karchkhadze, Keren Shao, Shlomo Dubnov

TL;DR

Interpreting Cardew’s Treatise graphic notation is the problem addressed. The authors present a two-stage AI pipeline that first converts score imagery to descriptive prompts via ChatGPT 4o and then to audio with MusicLDM, augmented by an outpainting method to ensure seamless transitions. They demonstrate this approach by generating and analyzing multiple tracks and detailing their compositional setup and prompts. The work expands the creative toolkit for contemporary/experimental music by showing how cross-modal AI can translate abstract visuals into expressive sound and suggests concrete paths to further improve control via direct visual-to-audio mappings.

Abstract

This work presents a novel method for composing and improvising music inspired by Cornelius Cardew's Treatise, using AI to bridge graphic notation and musical expression. By leveraging OpenAI's ChatGPT to interpret the abstract visual elements of Treatise, we convert these graphical images into descriptive textual prompts. These prompts are then input into MusicLDM, a pre-trained latent diffusion model designed for music generation. We introduce a technique called "outpainting," which overlaps sections of AI-generated music to create a seamless and cohesive composition. We demostrate a new perspective on performing and interpreting graphic scores, showing how AI can transform visual stimuli into sound and expand the creative possibilities in contemporary/experimental music composition. Musical pieces are available at https://bit.ly/TreatiseAI

Interpreting Graphic Notation with MusicLDM: An AI Improvisation of Cornelius Cardew's Treatise

TL;DR

Interpreting Cardew’s Treatise graphic notation is the problem addressed. The authors present a two-stage AI pipeline that first converts score imagery to descriptive prompts via ChatGPT 4o and then to audio with MusicLDM, augmented by an outpainting method to ensure seamless transitions. They demonstrate this approach by generating and analyzing multiple tracks and detailing their compositional setup and prompts. The work expands the creative toolkit for contemporary/experimental music by showing how cross-modal AI can translate abstract visuals into expressive sound and suggests concrete paths to further improve control via direct visual-to-audio mappings.

Abstract

This work presents a novel method for composing and improvising music inspired by Cornelius Cardew's Treatise, using AI to bridge graphic notation and musical expression. By leveraging OpenAI's ChatGPT to interpret the abstract visual elements of Treatise, we convert these graphical images into descriptive textual prompts. These prompts are then input into MusicLDM, a pre-trained latent diffusion model designed for music generation. We introduce a technique called "outpainting," which overlaps sections of AI-generated music to create a seamless and cohesive composition. We demostrate a new perspective on performing and interpreting graphic scores, showing how AI can transform visual stimuli into sound and expand the creative possibilities in contemporary/experimental music composition. Musical pieces are available at https://bit.ly/TreatiseAI

Paper Structure

This paper contains 15 sections, 1 equation, 5 figures.

Figures (5)

  • Figure 1: Treatise Improvisation Pipeline: The graphic scores are first processed by ChatGPT-4 to generate text prompts, which are then fed into MusicLDM, a latent diffusion model. Smooth transitions in the stitched audio output are achieved using the "outpainting" technique, illustrated here by the feedback loop around the latent diffusion model. Detailed explanations of this method can be found in the Methods section.
  • Figure :
  • Figure :
  • Figure :
  • Figure :