Table of Contents
Fetching ...

Steered Response Power for Sound Source Localization: A Tutorial Review

Eric Grinstein, Elisa Tengan, Bilgesu Çakmak, Thomas Dietzen, Leonardo Nunes, Toon van Waterschoot, Mike Brookes, Patrick A. Naylor

TL;DR

This work surveys the Steered Response Power (SRP) framework for sound source localization (SSL), with a focus on the SRP-PHAT variant and the modular X-SRP implementation. It formalizes SRP in time and frequency domains, clarifies TDOA geometry, and demonstrates a grid-search approach to locate one or more sources, while addressing computational complexity and robustness. The paper catalogs hundreds of extensions across complexity reduction, robustness improvements, multi-source handling, tracking, and practical deployments, and provides a unified, extensible software platform (X-SRP) to facilitate replication and experimentation. Its analysis highlights how SRP remains competitive in reverberant/noisy environments and remains a versatile foundation that can be augmented with neural components, prior information, and sparse/multi-target techniques for scalable SSL in real-world settings.

Abstract

In the last three decades, the Steered Response Power (SRP) method has been widely used for the task of Sound Source Localization (SSL), due to its satisfactory localization performance on moderately reverberant and noisy scenarios. Many works have analyzed and extended the original SRP method to reduce its computational cost, to allow it to locate multiple sources, or to improve its performance in adverse environments. In this work, we review over 200 papers on the SRP method and its variants, with emphasis on the SRP-PHAT method. We also present eXtensible-SRP, or X-SRP, a generalized and modularized version of the SRP algorithm which allows the reviewed extensions to be implemented. We provide a Python implementation of the algorithm which includes selected extensions from the literature.

Steered Response Power for Sound Source Localization: A Tutorial Review

TL;DR

This work surveys the Steered Response Power (SRP) framework for sound source localization (SSL), with a focus on the SRP-PHAT variant and the modular X-SRP implementation. It formalizes SRP in time and frequency domains, clarifies TDOA geometry, and demonstrates a grid-search approach to locate one or more sources, while addressing computational complexity and robustness. The paper catalogs hundreds of extensions across complexity reduction, robustness improvements, multi-source handling, tracking, and practical deployments, and provides a unified, extensible software platform (X-SRP) to facilitate replication and experimentation. Its analysis highlights how SRP remains competitive in reverberant/noisy environments and remains a versatile foundation that can be augmented with neural components, prior information, and sparse/multi-target techniques for scalable SSL in real-world settings.

Abstract

In the last three decades, the Steered Response Power (SRP) method has been widely used for the task of Sound Source Localization (SSL), due to its satisfactory localization performance on moderately reverberant and noisy scenarios. Many works have analyzed and extended the original SRP method to reduce its computational cost, to allow it to locate multiple sources, or to improve its performance in adverse environments. In this work, we review over 200 papers on the SRP method and its variants, with emphasis on the SRP-PHAT method. We also present eXtensible-SRP, or X-SRP, a generalized and modularized version of the SRP algorithm which allows the reviewed extensions to be implemented. We provide a Python implementation of the algorithm which includes selected extensions from the literature.
Paper Structure (43 sections, 33 equations, 9 figures, 1 algorithm)

This paper contains 43 sections, 33 equations, 9 figures, 1 algorithm.

Figures (9)

  • Figure 1: Hyperbola branch of points with the same TDOA as a source located at $\mathbf{u}$ with respect to microphone positions $\mathbf{v}_1$ and $\mathbf{v}_2$.
  • Figure 2: Example comparison between the normalized temporal cross-correlation and GCC-PHAT for a scenario containing two microphones and a source producing a speech signal with a TDOA of -2 ms.
  • Figure 3: Example of an SRP map for the task of 3D DOA estimation of a speech source using a spherical array of 8 microphones. Reverberation was simulated with a reverberation time of $T_{60} = 400$ ms, and the source is located below the transparent triangle at $(100^o, 60^o)$. Spatially uncorrelated white noise was added to the microphones at 20 dB SNR.
  • Figure 4: Comparison between SRP maps generated with (bottom) and without (top) volumetric techniques.
  • Figure 5: Low-pass version of the frequency-domain SRP, where only frequencies up to 200 Hz are considered.
  • ...and 4 more figures