Learning Control of Neural Sound Effects Synthesis from Physically Inspired Models
Yisu Zong, Joshua Reiss
TL;DR
The paper tackles the challenge of achieving both realism and intuitive control in real-time sound effects. It proposes a two-stage neural framework guided by a physically inspired explosion representation, using FiLM conditioning and a latent discriminator to obtain disentangled control, and explores two transfer strategies—supervised pseudo-labeling and unsupervised CycleGAN—for aligning synthetic sounds with real-world audio. Results show that supervised transfer delivers strong control fidelity within the PM parameter range, while unsupervised transfer provides robust audio quality across a broader parameter space, illustrating a practical pathway to fuse physics priors with neural synthesis for tunable, high-fidelity sound design. This work has potential impact for game audio, film post-production, and real-time sound design where both controllability and realism are crucial.
Abstract
Sound effects model design commonly uses digital signal processing techniques with full control ability, but it is difficult to achieve realism within a limited number of parameters. Recently, neural sound effects synthesis methods have emerged as a promising approach for generating high-quality and realistic sounds, but the process of synthesizing the desired sound poses difficulties in terms of control. This paper presents a real-time neural synthesis model guided by a physically inspired model, enabling the generation of high-quality sounds while inheriting the control interface of the physically inspired model. We showcase the superior performance of our model in terms of sound quality and control.
