Table of Contents
Fetching ...

Proposal of protocols for speech materials acquisition and presentation assisted by tools based on structured test signals

Hideki Kawahara, Ken-Ichi Sakakibara, Mitsunori Mizumachi, Kohei Yatabe

TL;DR

The paper tackles the challenge of obtaining and presenting speech materials that remain usable across diverse studies and real-world contexts. It introduces a structured, signal-based framework built on Time-Stretched Pulses (CAPRICEP) and the RHAPSODEE/RAPHSODEE methodology to simultaneously extract LTI, RTV, and SDTI impulse responses, enabling objective assessment of acquisition and presentation conditions. The authors provide measurement tools (GUI, MATLAB-based) and open-source materials to compute acoustic attributes such as $HNR$, reverberation parameters, and direct-indirect sound ratios, plus a simple field-friendly test signal for recording-condition annotation. This work aims to democratize rigorous acoustic measurement, bridging laboratory protocols with everyday speech data and enhancing material reuse in under-resourced environments through scalable computation and open resources.

Abstract

We propose protocols for acquiring speech materials, making them reusable for future investigations, and presenting them for subjective experiments. We also provide means to evaluate existing speech materials' compatibility with target applications. We built these protocols and tools based on structured test signals and analysis methods, including a new family of the Time-Stretched Pulse (TSP). Over a billion times more powerful computational (including software development) resources than a half-century ago enabled these protocols and tools to be accessible to under-resourced environments.

Proposal of protocols for speech materials acquisition and presentation assisted by tools based on structured test signals

TL;DR

The paper tackles the challenge of obtaining and presenting speech materials that remain usable across diverse studies and real-world contexts. It introduces a structured, signal-based framework built on Time-Stretched Pulses (CAPRICEP) and the RHAPSODEE/RAPHSODEE methodology to simultaneously extract LTI, RTV, and SDTI impulse responses, enabling objective assessment of acquisition and presentation conditions. The authors provide measurement tools (GUI, MATLAB-based) and open-source materials to compute acoustic attributes such as , reverberation parameters, and direct-indirect sound ratios, plus a simple field-friendly test signal for recording-condition annotation. This work aims to democratize rigorous acoustic measurement, bridging laboratory protocols with everyday speech data and enhancing material reuse in under-resourced environments through scalable computation and open resources.

Abstract

We propose protocols for acquiring speech materials, making them reusable for future investigations, and presenting them for subjective experiments. We also provide means to evaluate existing speech materials' compatibility with target applications. We built these protocols and tools based on structured test signals and analysis methods, including a new family of the Time-Stretched Pulse (TSP). Over a billion times more powerful computational (including software development) resources than a half-century ago enabled these protocols and tools to be accessible to under-resourced environments.
Paper Structure (18 sections, 6 figures, 1 table)

This paper contains 18 sections, 6 figures, 1 table.

Figures (6)

  • Figure 1: Contributing factors affecting speech material acquisition (adopted from Kawahara2023smac).
  • Figure 2: Contributing factors affecting speech presentation.
  • Figure 3: Schematic diagram of RHAPSODEE. This diagram is a refined version of Fig.1 in Kawahara2023apsipa. The rightmost gray frame represents the test signal for the simultaneous measurement. In the figure, "MLS: Maximum Length Sequence" and "Swept-sine" are commonly used test signals for impulse response measurements Aoshima1981jasaRife1989aes. The term "TSP: Time Stretched Pulse" represents them, and our CAPRICEP kawahara2021icassp is a new family member of TSP. Black dots indicate where output is available.
  • Figure 4: Assessment setup for input system test. Connection from R-out to R-in is not compulsory.
  • Figure 5: GUI snapshot of the interactive tool for acoustic condition assessment. Times-Roman notes explain lines.
  • ...and 1 more figures