Table of Contents
Fetching ...

SoundMorpher: Perceptually-Uniform Sound Morphing with Diffusion Model

Xinlei Niu, Jing Zhang, Charles Patrick Martin

TL;DR

SoundMorpher tackles the challenge of perceptually uniform sound morphing by leveraging a pre-trained diffusion model (AudioLDM2) and introducing Sound Perceptual Distance Proportion (SPDP) to map morph factors to perceptual change. The method combines conditional embedding and latent-space interpolation with LoRA-based model adaptation and a binary-search procedure to enforce constant perceptual increments along the morph path. It provides an objective evaluation framework based on established morphing criteria and demonstrates superior performance across timbral, environmental, and musical morphing tasks, with ablations guiding design choices. The approach promises broad applicability to creative audio workflows and sets the stage for future work in higher-fidelity synthesis and voice morphing.

Abstract

We present SoundMorpher, an open-world sound morphing method designed to generate perceptually uniform morphing trajectories. Traditional sound morphing techniques typically assume a linear relationship between the morphing factor and sound perception, achieving smooth transitions by linearly interpolating the semantic features of source and target sounds while gradually adjusting the morphing factor. However, these methods oversimplify the complexities of sound perception, resulting in limitations in morphing quality. In contrast, SoundMorpher explores an explicit relationship between the morphing factor and the perception of morphed sounds, leveraging log Mel-spectrogram features. This approach further refines the morphing sequence by ensuring a constant target perceptual difference for each transition and determining the corresponding morphing factors using binary search. To address the lack of a formal quantitative evaluation framework for sound morphing, we propose a set of metrics based on three established objective criteria. These metrics enable comprehensive assessment of morphed results and facilitate direct comparisons between methods, fostering advancements in sound morphing research. Extensive experiments demonstrate the effectiveness and versatility of SoundMorpher in real-world scenarios, showcasing its potential in applications such as creative music composition, film post-production, and interactive audio technologies. Our demonstration and codes are available at~\url{https://xinleiniu.github.io/SoundMorpher-demo/}.

SoundMorpher: Perceptually-Uniform Sound Morphing with Diffusion Model

TL;DR

SoundMorpher tackles the challenge of perceptually uniform sound morphing by leveraging a pre-trained diffusion model (AudioLDM2) and introducing Sound Perceptual Distance Proportion (SPDP) to map morph factors to perceptual change. The method combines conditional embedding and latent-space interpolation with LoRA-based model adaptation and a binary-search procedure to enforce constant perceptual increments along the morph path. It provides an objective evaluation framework based on established morphing criteria and demonstrates superior performance across timbral, environmental, and musical morphing tasks, with ablations guiding design choices. The approach promises broad applicability to creative audio workflows and sets the stage for future work in higher-fidelity synthesis and voice morphing.

Abstract

We present SoundMorpher, an open-world sound morphing method designed to generate perceptually uniform morphing trajectories. Traditional sound morphing techniques typically assume a linear relationship between the morphing factor and sound perception, achieving smooth transitions by linearly interpolating the semantic features of source and target sounds while gradually adjusting the morphing factor. However, these methods oversimplify the complexities of sound perception, resulting in limitations in morphing quality. In contrast, SoundMorpher explores an explicit relationship between the morphing factor and the perception of morphed sounds, leveraging log Mel-spectrogram features. This approach further refines the morphing sequence by ensuring a constant target perceptual difference for each transition and determining the corresponding morphing factors using binary search. To address the lack of a formal quantitative evaluation framework for sound morphing, we propose a set of metrics based on three established objective criteria. These metrics enable comprehensive assessment of morphed results and facilitate direct comparisons between methods, fostering advancements in sound morphing research. Extensive experiments demonstrate the effectiveness and versatility of SoundMorpher in real-world scenarios, showcasing its potential in applications such as creative music composition, film post-production, and interactive audio technologies. Our demonstration and codes are available at~\url{https://xinleiniu.github.io/SoundMorpher-demo/}.
Paper Structure (17 sections, 10 equations, 7 figures, 6 tables, 2 algorithms)

This paper contains 17 sections, 10 equations, 7 figures, 6 tables, 2 algorithms.

Figures (7)

  • Figure 1: Overview of SoundMorpher pipeline, where the snowflake represents the model parameters are frozen.
  • Figure 2: Timbre space visualization of morph trajectories for piano-organ timbre morphing. Compared to SMT, SoundMorpher produces a smoother and continuous morph in the timbre space.
  • Figure 3: Visualization of timbre morphing for musical instruments with $N = 11$ by SoundMorpher compared to SMT.
  • Figure 4: Visualization of spectrogram for morphed results compred with MorphFader. SoundMorpher appears to provide a more seamless and stable morphing process, in which transitions are smoother and the spectral content is more consistent across the morphing stages.
  • Figure 5: Visualization of environmental sound morphing with $N=5$, from top to bottom: (1) church bells $\leftrightarrow$ clock alarm (2) crying baby $\leftrightarrow$ laughing (3) cat $\leftrightarrow$ dog (4) clapping $\leftrightarrow$ wood door knocking
  • ...and 2 more figures