Explainability Paths for Sustained Artistic Practice with AI
Austin Tecks, Thomas Peschlow, Gabriel Vigliensoni
TL;DR
The paper tackles explainability in AI-driven generative audio as a barrier to sustained artistic practice. It proposes concrete provisions—human-scale models, artist-curated datasets, extended iteration beyond inference, and Interactive Learning as a latent-space mapping tool—to enhance agency across data curation, training, and performance. A case study using RAVE trained on archival Chilean recordings demonstrates data preparation, staged training with a VAE and GAN, and real-time latent-space control via facial gestures, illustrating practical paths to steerable and temporally coherent outputs. Collectively, these approaches offer practical leverage for artists seeking long-term engagement with generative audio systems while mitigating opacity.
Abstract
The development of AI-driven generative audio mirrors broader AI trends, often prioritizing immediate accessibility at the expense of explainability. Consequently, integrating such tools into sustained artistic practice remains a significant challenge. In this paper, we explore several paths to improve explainability, drawing primarily from our research-creation practice in training and implementing generative audio models. As practical provisions for improved explainability, we highlight human agency over training materials, the viability of small-scale datasets, the facilitation of the iterative creative process, and the integration of interactive machine learning as a mapping tool. Importantly, these steps aim to enhance human agency over generative AI systems not only during model inference, but also when curating and preprocessing training data as well as during the training phase of models.
