Proposal of protocols for speech materials acquisition and presentation assisted by tools based on structured test signals
Hideki Kawahara, Ken-Ichi Sakakibara, Mitsunori Mizumachi, Kohei Yatabe
TL;DR
The paper tackles the challenge of obtaining and presenting speech materials that remain usable across diverse studies and real-world contexts. It introduces a structured, signal-based framework built on Time-Stretched Pulses (CAPRICEP) and the RHAPSODEE/RAPHSODEE methodology to simultaneously extract LTI, RTV, and SDTI impulse responses, enabling objective assessment of acquisition and presentation conditions. The authors provide measurement tools (GUI, MATLAB-based) and open-source materials to compute acoustic attributes such as $HNR$, reverberation parameters, and direct-indirect sound ratios, plus a simple field-friendly test signal for recording-condition annotation. This work aims to democratize rigorous acoustic measurement, bridging laboratory protocols with everyday speech data and enhancing material reuse in under-resourced environments through scalable computation and open resources.
Abstract
We propose protocols for acquiring speech materials, making them reusable for future investigations, and presenting them for subjective experiments. We also provide means to evaluate existing speech materials' compatibility with target applications. We built these protocols and tools based on structured test signals and analysis methods, including a new family of the Time-Stretched Pulse (TSP). Over a billion times more powerful computational (including software development) resources than a half-century ago enabled these protocols and tools to be accessible to under-resourced environments.
