Table of Contents
Fetching ...

A Mapping Strategy for Interacting with Latent Audio Synthesis Using Artistic Materials

Shuoyang Zheng, Anna Xambó Sedó, Nick Bryan-Kinns

TL;DR

This work addresses explainable, real-time control of latent audio synthesis using high-dimensional artistic data. It proposes a two-step mapping strategy: (i) latent mapping via unsupervised feature learning to encode artistic inputs into a compact latent representation, and (ii) Interactive Machine Learning to map those features to the audio model's latent parameters. The demonstration couples a sketch-to-sound controller to a latent audio-synthesis model (RAVE), encoding sketches into a $32$-dimensional latent space and driving a $16$-dimensional RAVE latent space via OSC in a Max4Live pipeline; the artist provides a few paired training examples to train the IML mapping. The results show real-time, perceptually meaningful control and foreground discussions of temporal and cross-modal explainability as directions for future work.

Abstract

This paper presents a mapping strategy for interacting with the latent spaces of generative AI models. Our approach involves using unsupervised feature learning to encode a human control space and mapping it to an audio synthesis model's latent space. To demonstrate how this mapping strategy can turn high-dimensional sensor data into control mechanisms of a deep generative model, we present a proof-of-concept system that uses visual sketches to control an audio synthesis model. We draw on emerging discourses in XAIxArts to discuss how this approach can contribute to XAI in artistic and creative contexts, we also discuss its current limitations and propose future research directions.

A Mapping Strategy for Interacting with Latent Audio Synthesis Using Artistic Materials

TL;DR

This work addresses explainable, real-time control of latent audio synthesis using high-dimensional artistic data. It proposes a two-step mapping strategy: (i) latent mapping via unsupervised feature learning to encode artistic inputs into a compact latent representation, and (ii) Interactive Machine Learning to map those features to the audio model's latent parameters. The demonstration couples a sketch-to-sound controller to a latent audio-synthesis model (RAVE), encoding sketches into a -dimensional latent space and driving a -dimensional RAVE latent space via OSC in a Max4Live pipeline; the artist provides a few paired training examples to train the IML mapping. The results show real-time, perceptually meaningful control and foreground discussions of temporal and cross-modal explainability as directions for future work.

Abstract

This paper presents a mapping strategy for interacting with the latent spaces of generative AI models. Our approach involves using unsupervised feature learning to encode a human control space and mapping it to an audio synthesis model's latent space. To demonstrate how this mapping strategy can turn high-dimensional sensor data into control mechanisms of a deep generative model, we present a proof-of-concept system that uses visual sketches to control an audio synthesis model. We draw on emerging discourses in XAIxArts to discuss how this approach can contribute to XAI in artistic and creative contexts, we also discuss its current limitations and propose future research directions.
Paper Structure (5 sections, 2 figures)

This paper contains 5 sections, 2 figures.

Figures (2)

  • Figure 1: Proposed mapping strategy
  • Figure 2: Two screenshots of user interfaces showing the sketch and the audio synthesis model's latent spaces. A full demonstration video can be viewed at https://tinyurl.com/xaixarts24-mapping.