SynthScribe: Deep Multimodal Tools for Synthesizer Sound Retrieval and Exploration
Stephen Brade, Bryan Wang, Mauricio Sousa, Gregory Lee Newsome, Sageev Oore, Tovi Grossman
TL;DR
SynthScribe presents a full-stack multimodal system for synthesizer sound retrieval, modification, and creation, leveraging LAION-CLAP to connect text and audio with a Diva-based preset bank. By combining a multimodal search, a user-centered genetic mixing algorithm, and a text/audio-driven preset modification interface, the approach enables high-level control over timbres without retraining models for each synthesizer. Two user studies demonstrate reliable sound retrieval and meaningful modifications, while free usage observations show time savings and the discovery of surprising, original sounds. The work highlights a practical pathway to democratize sound design, enabling both novices and professionals to explore and invent timbres efficiently, with future work focusing on personalization, cross-synth support, and timbre-language customization.
Abstract
Synthesizers are powerful tools that allow musicians to create dynamic and original sounds. Existing commercial interfaces for synthesizers typically require musicians to interact with complex low-level parameters or to manage large libraries of premade sounds. To address these challenges, we implement SynthScribe -- a fullstack system that uses multimodal deep learning to let users express their intentions at a much higher level. We implement features which address a number of difficulties, namely 1) searching through existing sounds, 2) creating completely new sounds, 3) making meaningful modifications to a given sound. This is achieved with three main features: a multimodal search engine for a large library of synthesizer sounds; a user centered genetic algorithm by which completely new sounds can be created and selected given the users preferences; a sound editing support feature which highlights and gives examples for key control parameters with respect to a text or audio based query. The results of our user studies show SynthScribe is capable of reliably retrieving and modifying sounds while also affording the ability to create completely new sounds that expand a musicians creative horizon.
