Table of Contents
Fetching ...

Generative AI for Music and Audio

Hao-Wen Dong

TL;DR

This dissertation introduces the three main directions of the research centered around generative AI for music and audio: 1) multitrack music generation, 2) assistive music creation tools, and 3) multimodal learning for audio and music.

Abstract

Generative AI has been transforming the way we interact with technology and consume content. In the next decade, AI technology will reshape how we create audio content in various media, including music, theater, films, games, podcasts, and short videos. In this dissertation, I introduce the three main directions of my research centered around generative AI for music and audio: 1) multitrack music generation, 2) assistive music creation tools, and 3) multimodal learning for audio and music. Through my research, I aim to answer the following two fundamental questions: 1) How can AI help professionals or amateurs create music and audio content? 2) Can AI learn to create music in a way similar to how humans learn music? My long-term goal is to lower the barrier of entry for music composition and democratize audio content creation

Generative AI for Music and Audio

TL;DR

This dissertation introduces the three main directions of the research centered around generative AI for music and audio: 1) multitrack music generation, 2) assistive music creation tools, and 3) multimodal learning for audio and music.

Abstract

Generative AI has been transforming the way we interact with technology and consume content. In the next decade, AI technology will reshape how we create audio content in various media, including music, theater, films, games, podcasts, and short videos. In this dissertation, I introduce the three main directions of my research centered around generative AI for music and audio: 1) multitrack music generation, 2) assistive music creation tools, and 3) multimodal learning for audio and music. Through my research, I aim to answer the following two fundamental questions: 1) How can AI help professionals or amateurs create music and audio content? 2) Can AI learn to create music in a way similar to how humans learn music? My long-term goal is to lower the barrier of entry for music composition and democratize audio content creation

Paper Structure

This paper contains 132 sections, 10 equations, 48 figures, 27 tables.

Figures (48)

  • Figure 1: Looking back to the past, how music and technology interacts has always been a two-way process. (Left) the violin-making industry grows with the classical music, and together create the golden age of classical music. (Right) the invention and development of synthesizers and drum machines helped popularize electronic music. Image sources (from left to right): 1) Mozart83, Public domain, via Wikimedia Commons, 2) Hildegard Dodel, Public domain, via Wikimedia Commons, 3) yan, CC BY-SA 4.0, via Wikimedia Commons, and 4) taken at Hamamatsu Museum of Musical Instruments, August 2019.
  • Figure 2: An overview of the three main directions of my research.
  • Figure 3: An example of a learning-based music generation system. MusPy provides basic routines specific to music as well as interfaces to machine learning frameworks.
  • Figure 4: System diagram of MusPy. The MusPy Music object at the center is the core element of MusPy.
  • Figure 5: Examples of (a) training data preparation and (b) result writing pipelines using MusPy.
  • ...and 43 more figures