Table of Contents
Fetching ...

The Concatenator: A Bayesian Approach To Real Time Concatenative Musaicing

Christopher Tralie, Ben Cantil

TL;DR

The real-time capability of the Concatenator system unlocks new avenues for musical expression and control, suitable for live performance and modular synthesis integration, which furthermore represents an essential breakthrough in concatenative synthesis technology.

Abstract

We present ``The Concatenator,'' a real time system for audio-guided concatenative synthesis. Similarly to Driedger et al.'s ``musaicing'' (or ``audio mosaicing'') technique, we concatenate a set number of windows within a corpus of audio to re-create the harmonic and percussive aspects of a target audio stream. Unlike Driedger's NMF-based technique, however, we instead use an explicitly Bayesian point of view, where corpus window indices are hidden states and the target audio stream is an observation. We use a particle filter to infer the best hidden corpus states in real-time. Our transition model includes a tunable parameter to control the time-continuity of corpus grains, and our observation model allows users to prioritize how quickly windows change to match the target. Because the computational complexity of the system is independent of the corpus size, our system scales to corpora that are hours long, which is an important feature in the age of vast audio data collections. Within The Concatenator module itself, composers can vary grain length, fit to target, and pitch shift in real time while reacting to the sounds they hear, enabling them to rapidly iterate ideas. To conclude our work, we evaluate our system with extensive quantitative tests of the effects of parameters, as well as a qualitative evaluation with artistic insights. Based on the quality of the results, we believe the real-time capability unlocks new avenues for musical expression and control, suitable for live performance and modular synthesis integration, which furthermore represents an essential breakthrough in concatenative synthesis technology.

The Concatenator: A Bayesian Approach To Real Time Concatenative Musaicing

TL;DR

The real-time capability of the Concatenator system unlocks new avenues for musical expression and control, suitable for live performance and modular synthesis integration, which furthermore represents an essential breakthrough in concatenative synthesis technology.

Abstract

We present ``The Concatenator,'' a real time system for audio-guided concatenative synthesis. Similarly to Driedger et al.'s ``musaicing'' (or ``audio mosaicing'') technique, we concatenate a set number of windows within a corpus of audio to re-create the harmonic and percussive aspects of a target audio stream. Unlike Driedger's NMF-based technique, however, we instead use an explicitly Bayesian point of view, where corpus window indices are hidden states and the target audio stream is an observation. We use a particle filter to infer the best hidden corpus states in real-time. Our transition model includes a tunable parameter to control the time-continuity of corpus grains, and our observation model allows users to prioritize how quickly windows change to match the target. Because the computational complexity of the system is independent of the corpus size, our system scales to corpora that are hours long, which is an important feature in the age of vast audio data collections. Within The Concatenator module itself, composers can vary grain length, fit to target, and pitch shift in real time while reacting to the sounds they hear, enabling them to rapidly iterate ideas. To conclude our work, we evaluate our system with extensive quantitative tests of the effects of parameters, as well as a qualitative evaluation with artistic insights. Based on the quality of the results, we believe the real-time capability unlocks new avenues for musical expression and control, suitable for live performance and modular synthesis integration, which furthermore represents an essential breakthrough in concatenative synthesis technology.

Paper Structure

This paper contains 12 sections, 12 equations, 6 figures.

Figures (6)

  • Figure 1: The Concatenator maintains $P$ "particles," each of which represents $p$ specific corpus windows. Each window moves forward by 1 timestep in the corpus with probability $p_d$, or otherwise jumps randomly. Then, particles each mix their windows to best match the target, and particles with the top 10% best fits to the target vote on a final set of $p$ windows.
  • Figure 2: The activation distribution of The Concatenator is more exponential in nature than that of Driedger et al. In fact, it exactly follows the geometric distribution if the tempreature $\tau = 0$.
  • Figure 3: Increasing polyphony leads to a better fit (ratios $<1$), and increasing particles leads to a better fit, especially for larger corpora like the Woodwinds ($\approx$1.6hrs).
  • Figure 4: Increasing $p_d$ increases the average grain length since windows are less likely to jump at each timestep.
  • Figure 5: Increasing $\tau$decreases the average grain length since this prioritizes the observation probability.
  • ...and 1 more figures