Proposal of protocols for speech materials acquisition and presentation assisted by tools based on structured test signals

Hideki Kawahara; Ken-Ichi Sakakibara; Mitsunori Mizumachi; Kohei Yatabe

Proposal of protocols for speech materials acquisition and presentation assisted by tools based on structured test signals

Hideki Kawahara, Ken-Ichi Sakakibara, Mitsunori Mizumachi, Kohei Yatabe

TL;DR

The paper tackles the challenge of obtaining and presenting speech materials that remain usable across diverse studies and real-world contexts. It introduces a structured, signal-based framework built on Time-Stretched Pulses (CAPRICEP) and the RHAPSODEE/RAPHSODEE methodology to simultaneously extract LTI, RTV, and SDTI impulse responses, enabling objective assessment of acquisition and presentation conditions. The authors provide measurement tools (GUI, MATLAB-based) and open-source materials to compute acoustic attributes such as $HNR$, reverberation parameters, and direct-indirect sound ratios, plus a simple field-friendly test signal for recording-condition annotation. This work aims to democratize rigorous acoustic measurement, bridging laboratory protocols with everyday speech data and enhancing material reuse in under-resourced environments through scalable computation and open resources.

Abstract

We propose protocols for acquiring speech materials, making them reusable for future investigations, and presenting them for subjective experiments. We also provide means to evaluate existing speech materials' compatibility with target applications. We built these protocols and tools based on structured test signals and analysis methods, including a new family of the Time-Stretched Pulse (TSP). Over a billion times more powerful computational (including software development) resources than a half-century ago enabled these protocols and tools to be accessible to under-resourced environments.

Proposal of protocols for speech materials acquisition and presentation assisted by tools based on structured test signals

TL;DR

, reverberation parameters, and direct-indirect sound ratios, plus a simple field-friendly test signal for recording-condition annotation. This work aims to democratize rigorous acoustic measurement, bridging laboratory protocols with everyday speech data and enhancing material reuse in under-resourced environments through scalable computation and open resources.

Abstract

Paper Structure (18 sections, 6 figures, 1 table)

This paper contains 18 sections, 6 figures, 1 table.

Introduction
Background
Assessment of acoustic conditions
Relevant attributes
Acquisition related attributes
Presentation related attributes
RHAPSODEE and measurement
Impulse response measurements
Linear convolution for LTI: top gray block of Fig. \ref{['fig:RHAPSODEEc']}
Circular convolution for RTV: middle gray block of Fig. \ref{['fig:RHAPSODEEc']}
Multiple siganls for SDTI: bottom gray block of Fig. \ref{['fig:RHAPSODEEc']}
Measurement of acoustic attributes
Tools for acoustic measurement
Examples
Simple test signal for annotation of recording condition
...and 3 more sections

Figures (6)

Figure 1: Contributing factors affecting speech material acquisition (adopted from Kawahara2023smac).
Figure 2: Contributing factors affecting speech presentation.
Figure 3: Schematic diagram of RHAPSODEE. This diagram is a refined version of Fig.1 in Kawahara2023apsipa. The rightmost gray frame represents the test signal for the simultaneous measurement. In the figure, "MLS: Maximum Length Sequence" and "Swept-sine" are commonly used test signals for impulse response measurements Aoshima1981jasaRife1989aes. The term "TSP: Time Stretched Pulse" represents them, and our CAPRICEP kawahara2021icassp is a new family member of TSP. Black dots indicate where output is available.
Figure 4: Assessment setup for input system test. Connection from R-out to R-in is not compulsory.
Figure 5: GUI snapshot of the interactive tool for acoustic condition assessment. Times-Roman notes explain lines.
...and 1 more figures

Proposal of protocols for speech materials acquisition and presentation assisted by tools based on structured test signals

TL;DR

Abstract

Proposal of protocols for speech materials acquisition and presentation assisted by tools based on structured test signals

Authors

TL;DR

Abstract

Table of Contents

Figures (6)