Table of Contents
Fetching ...

Exploring the User Experience of AI-Assisted Sound Searching Systems for Creative Workflows

Haohe Liu, Thomas Deacon, Wenwu Wang, Matt Paradis, Mark D. Plumbley

TL;DR

This work tackles the challenge of efficient sound effect retrieval in creative workflows by introducing CLAP-UI, a CLAP-based, annotation-free search system that links natural language queries to audio embeddings. In a two-stage, expert-user study against the BBC-SFX-UI, CLAP-UI demonstrated significantly higher productivity and lower frustration while maintaining comparable cognitive demands. The findings show that semantic, language-grounded search can improve relevance and speed in professional sound sourcing, with notable gains on targeted prompts, albeit with limitations tied to data fidelity and metadata. The study highlights CLAP-UI’s potential to streamline audio production and foster creative exploration, while outlining directions for higher-quality datasets, richer metadata, and improved explainability to further enhance real-world adoption.

Abstract

Locating the right sound effect efficiently is an important yet challenging topic for audio production. Most current sound-searching systems rely on pre-annotated audio labels created by humans, which can be time-consuming to produce and prone to inaccuracies, limiting the efficiency of audio production. Following the recent advancement of contrastive language-audio pre-training (CLAP) models, we explore an alternative CLAP-based sound-searching system (CLAP-UI) that does not rely on human annotations. To evaluate the effectiveness of CLAP-UI, we conducted comparative experiments with a widely used sound effect searching platform, the BBC Sound Effect Library. Our study evaluates user performance, cognitive load, and satisfaction through ecologically valid tasks based on professional sound-searching workflows. Our result shows that CLAP-UI demonstrated significantly enhanced productivity and reduced frustration while maintaining comparable cognitive demands. We also qualitatively analyzed the participants' feedback, which offered valuable perspectives on the design of future AI-assisted sound search systems.

Exploring the User Experience of AI-Assisted Sound Searching Systems for Creative Workflows

TL;DR

This work tackles the challenge of efficient sound effect retrieval in creative workflows by introducing CLAP-UI, a CLAP-based, annotation-free search system that links natural language queries to audio embeddings. In a two-stage, expert-user study against the BBC-SFX-UI, CLAP-UI demonstrated significantly higher productivity and lower frustration while maintaining comparable cognitive demands. The findings show that semantic, language-grounded search can improve relevance and speed in professional sound sourcing, with notable gains on targeted prompts, albeit with limitations tied to data fidelity and metadata. The study highlights CLAP-UI’s potential to streamline audio production and foster creative exploration, while outlining directions for higher-quality datasets, richer metadata, and improved explainability to further enhance real-world adoption.

Abstract

Locating the right sound effect efficiently is an important yet challenging topic for audio production. Most current sound-searching systems rely on pre-annotated audio labels created by humans, which can be time-consuming to produce and prone to inaccuracies, limiting the efficiency of audio production. Following the recent advancement of contrastive language-audio pre-training (CLAP) models, we explore an alternative CLAP-based sound-searching system (CLAP-UI) that does not rely on human annotations. To evaluate the effectiveness of CLAP-UI, we conducted comparative experiments with a widely used sound effect searching platform, the BBC Sound Effect Library. Our study evaluates user performance, cognitive load, and satisfaction through ecologically valid tasks based on professional sound-searching workflows. Our result shows that CLAP-UI demonstrated significantly enhanced productivity and reduced frustration while maintaining comparable cognitive demands. We also qualitatively analyzed the participants' feedback, which offered valuable perspectives on the design of future AI-assisted sound search systems.

Paper Structure

This paper contains 25 sections, 3 equations, 5 figures, 2 tables.

Figures (5)

  • Figure 1: The user interface of our CLAP-based sound searching system.
  • Figure 2: Overall script difficulty distribution ($p=1.14 \times 10^{-8}, r=0.416$).
  • Figure 3: Difficulty rating for each of the sound effect scripts. Red boxes mark the statistically significant results after the Bonferroni correction.
  • Figure 4: Participant ratings for Q1 and Q2.
  • Figure 5: Task load index evaluation result.