Exploring the User Experience of AI-Assisted Sound Searching Systems for Creative Workflows
Haohe Liu, Thomas Deacon, Wenwu Wang, Matt Paradis, Mark D. Plumbley
TL;DR
This work tackles the challenge of efficient sound effect retrieval in creative workflows by introducing CLAP-UI, a CLAP-based, annotation-free search system that links natural language queries to audio embeddings. In a two-stage, expert-user study against the BBC-SFX-UI, CLAP-UI demonstrated significantly higher productivity and lower frustration while maintaining comparable cognitive demands. The findings show that semantic, language-grounded search can improve relevance and speed in professional sound sourcing, with notable gains on targeted prompts, albeit with limitations tied to data fidelity and metadata. The study highlights CLAP-UI’s potential to streamline audio production and foster creative exploration, while outlining directions for higher-quality datasets, richer metadata, and improved explainability to further enhance real-world adoption.
Abstract
Locating the right sound effect efficiently is an important yet challenging topic for audio production. Most current sound-searching systems rely on pre-annotated audio labels created by humans, which can be time-consuming to produce and prone to inaccuracies, limiting the efficiency of audio production. Following the recent advancement of contrastive language-audio pre-training (CLAP) models, we explore an alternative CLAP-based sound-searching system (CLAP-UI) that does not rely on human annotations. To evaluate the effectiveness of CLAP-UI, we conducted comparative experiments with a widely used sound effect searching platform, the BBC Sound Effect Library. Our study evaluates user performance, cognitive load, and satisfaction through ecologically valid tasks based on professional sound-searching workflows. Our result shows that CLAP-UI demonstrated significantly enhanced productivity and reduced frustration while maintaining comparable cognitive demands. We also qualitatively analyzed the participants' feedback, which offered valuable perspectives on the design of future AI-assisted sound search systems.
