Interactive tools for making temporally variable, multiple-attributes, and multiple-instances morphing accessible: Flexible manipulation of divergent speech instances for explorational research and education
Hideki Kawahara, Masanori Morise
TL;DR
This work addresses the complexity barrier in generalized voice morphing by extending WORLD-based tools to support temporally variable, multi-attribute, and multi-instance morphing. It formalizes the generalized morphing framework with transformed-domain interpolation and introduces practical WORLD-based tools, including an integrated morphing-object preparation GUI and an App Designer demonstration for three-instance morphing. Key contributions include the anchor-assignment and time-frequency alignment workflow, procedural step-by-step guidance, and open-source deployment with tutorials to facilitate education and exploratory research. The approach enables accessible, hands-on exploration of speech production and perception, with potential pathways to integrate neural vocoders and end-to-end latent-variable representations while preserving user-driven control over morphing parameters.
Abstract
We generalized a voice morphing algorithm capable of handling temporally variable, multiple-attributes, and multiple instances. The generalized morphing provides a new strategy for investigating speech diversity. However, excessive complexity and the difficulty of preparation have prevented researchers and students from enjoying its benefits. To address this issue, we introduced a set of interactive tools to make preparation and tests less cumbersome. These tools are integrated into our previously reported interactive tools as extensions. The introduction of the extended tools in lessons in graduate education was successful. Finally, we outline further extensions to explore excessively complex morphing parameter settings.
