Table of Contents
Fetching ...

Reverberation-Robust Localization of Speakers Using Distinct Speech Onsets and Multi-channel Cross-Correlations

Shoufeng Lin

Abstract

Many speaker localization methods can be found in the literature. However, speaker localization under strong reverberation still remains a major challenge in the real-world applications. This paper proposes two algorithms for localizing speakers using microphone array recordings of reverberated sounds. To separate concurrent speakers, the first algorithm decomposes microphone signals spectrotemporally into subbands via an auditory filterbank. To suppress reverberation, we propose a novel speech onset detection approach derived from the speech signal and impulse response models, and further propose to formulate the multi-channel cross-correlation coefficient (MCCC) of encoded speech onsets in each subband. The subband results are combined to estimate the directions-of-arrival (DOAs) of speakers. The second algorithm extends the generalized cross-correlation - phase transform (GCC-PHAT) method by using redundant information of multiple microphones to address the reverberation problem. The proposed methods have been evaluated under adverse conditions using not only simulated signals (reverberation time $T_{60}$ of up to $1$s) but also recordings in a real reverberant room ($T_{60} \approx 0.65$s). Comparing with some state-of-the-art localization methods, experimental results confirm that the proposed methods can reliably locate static and moving speakers, in presence of reverberation.

Reverberation-Robust Localization of Speakers Using Distinct Speech Onsets and Multi-channel Cross-Correlations

Abstract

Many speaker localization methods can be found in the literature. However, speaker localization under strong reverberation still remains a major challenge in the real-world applications. This paper proposes two algorithms for localizing speakers using microphone array recordings of reverberated sounds. To separate concurrent speakers, the first algorithm decomposes microphone signals spectrotemporally into subbands via an auditory filterbank. To suppress reverberation, we propose a novel speech onset detection approach derived from the speech signal and impulse response models, and further propose to formulate the multi-channel cross-correlation coefficient (MCCC) of encoded speech onsets in each subband. The subband results are combined to estimate the directions-of-arrival (DOAs) of speakers. The second algorithm extends the generalized cross-correlation - phase transform (GCC-PHAT) method by using redundant information of multiple microphones to address the reverberation problem. The proposed methods have been evaluated under adverse conditions using not only simulated signals (reverberation time of up to s) but also recordings in a real reverberant room (s). Comparing with some state-of-the-art localization methods, experimental results confirm that the proposed methods can reliably locate static and moving speakers, in presence of reverberation.

Paper Structure

This paper contains 34 sections, 76 equations, 9 figures, 1 table.

Figures (9)

  • Figure 1: Speech signal (top panel), subband signal, recursive average, and encoded subband onset signal (bottom panel).
  • Figure 2: Top view of room and set-up (simulation). Locations of microphones and speakers are respectively in circles and stars.
  • Figure 3: Top view of room and set-up (real-world). Locations of microphones are in black circles. Tracks of moving speakers in blue (Speaker1), red (Speaker2) and green (Speaker3). Starting locations of tracks are solid circles and ending locations are triangles.
  • Figure 4: Raw signals of moving speakers (top three panels) and a normalized real recording from one of the microphones in the real reverberant room (bottom panel).
  • Figure 5: Normalized histograms from the Onset-MCCC, MCC-PHAT and Neuro-Fuzzy methods, steered response power from the TF-CHB method and DOA estimates from the EB-ESPRIT method.
  • ...and 4 more figures