Steered Response Power for Sound Source Localization: A Tutorial Review

Eric Grinstein; Elisa Tengan; Bilgesu Çakmak; Thomas Dietzen; Leonardo Nunes; Toon van Waterschoot; Mike Brookes; Patrick A. Naylor

Steered Response Power for Sound Source Localization: A Tutorial Review

Eric Grinstein, Elisa Tengan, Bilgesu Çakmak, Thomas Dietzen, Leonardo Nunes, Toon van Waterschoot, Mike Brookes, Patrick A. Naylor

TL;DR

This work surveys the Steered Response Power (SRP) framework for sound source localization (SSL), with a focus on the SRP-PHAT variant and the modular X-SRP implementation. It formalizes SRP in time and frequency domains, clarifies TDOA geometry, and demonstrates a grid-search approach to locate one or more sources, while addressing computational complexity and robustness. The paper catalogs hundreds of extensions across complexity reduction, robustness improvements, multi-source handling, tracking, and practical deployments, and provides a unified, extensible software platform (X-SRP) to facilitate replication and experimentation. Its analysis highlights how SRP remains competitive in reverberant/noisy environments and remains a versatile foundation that can be augmented with neural components, prior information, and sparse/multi-target techniques for scalable SSL in real-world settings.

Abstract

In the last three decades, the Steered Response Power (SRP) method has been widely used for the task of Sound Source Localization (SSL), due to its satisfactory localization performance on moderately reverberant and noisy scenarios. Many works have analyzed and extended the original SRP method to reduce its computational cost, to allow it to locate multiple sources, or to improve its performance in adverse environments. In this work, we review over 200 papers on the SRP method and its variants, with emphasis on the SRP-PHAT method. We also present eXtensible-SRP, or X-SRP, a generalized and modularized version of the SRP algorithm which allows the reviewed extensions to be implemented. We provide a Python implementation of the algorithm which includes selected extensions from the literature.

Steered Response Power for Sound Source Localization: A Tutorial Review

TL;DR

Abstract

Paper Structure (43 sections, 33 equations, 9 figures, 1 algorithm)

This paper contains 43 sections, 33 equations, 9 figures, 1 algorithm.

Introduction
The conventional SRP model
Problem statement and definitions
Near- versus Far-field localization
Signal model
Acoustics, TOF and TDOA
Estimating TDOA: Cross-correlation and GCC-PHAT
Time-domain SRP formulation
Frequency-domain SRP formulation
Grid construction and search
Reducing SRP's complexity and computational time
Complexity analysis
Coarse grids and Volumetric-SRP
Iterative grid refinement
Grids based on prior location estimates
...and 28 more sections

Figures (9)

Figure 1: Hyperbola branch of points with the same TDOA as a source located at $\mathbf{u}$ with respect to microphone positions $\mathbf{v}_1$ and $\mathbf{v}_2$.
Figure 2: Example comparison between the normalized temporal cross-correlation and GCC-PHAT for a scenario containing two microphones and a source producing a speech signal with a TDOA of -2 ms.
Figure 3: Example of an SRP map for the task of 3D DOA estimation of a speech source using a spherical array of 8 microphones. Reverberation was simulated with a reverberation time of $T_{60} = 400$ ms, and the source is located below the transparent triangle at $(100^o, 60^o)$. Spatially uncorrelated white noise was added to the microphones at 20 dB SNR.
Figure 4: Comparison between SRP maps generated with (bottom) and without (top) volumetric techniques.
Figure 5: Low-pass version of the frequency-domain SRP, where only frequencies up to 200 Hz are considered.
...and 4 more figures

Steered Response Power for Sound Source Localization: A Tutorial Review

TL;DR

Abstract

Steered Response Power for Sound Source Localization: A Tutorial Review

Authors

TL;DR

Abstract

Table of Contents

Figures (9)