Table of Contents
Fetching ...

Uncertainty Quantification and Risk Control for Multi-Speaker Sound Source Localization

Vadim Rozenfeld, Bracha Laufer Goldshtein

Abstract

Reliable Sound Source Localization (SSL) plays an essential role in many downstream tasks, where informed decision making depends not only on accurate localization but also on the confidence in each estimate. This need for reliability becomes even more pronounced in challenging conditions, such as reverberant environments and multi-source scenarios. However, existing SSL methods typically provide only point estimates, offering limited or no Uncertainty Quantification (UQ). We leverage the Conformal Prediction (CP) framework and its extensions for controlling general risk functions to develop two complementary UQ approaches for SSL. The first assumes that the number of active sources is known and constructs prediction regions that cover the true source locations. The second addresses the more challenging setting where the source count is unknown, first reliably estimating the number of active sources and then forming corresponding prediction regions. We evaluate the proposed methods on extensive simulations and real-world recordings across varying reverberation levels and source configurations. Results demonstrate reliable finite-sample guarantees and consistent performance for both known and unknown source-count scenarios, highlighting the practical utility of the proposed frameworks for uncertainty-aware SSL.

Uncertainty Quantification and Risk Control for Multi-Speaker Sound Source Localization

Abstract

Reliable Sound Source Localization (SSL) plays an essential role in many downstream tasks, where informed decision making depends not only on accurate localization but also on the confidence in each estimate. This need for reliability becomes even more pronounced in challenging conditions, such as reverberant environments and multi-source scenarios. However, existing SSL methods typically provide only point estimates, offering limited or no Uncertainty Quantification (UQ). We leverage the Conformal Prediction (CP) framework and its extensions for controlling general risk functions to develop two complementary UQ approaches for SSL. The first assumes that the number of active sources is known and constructs prediction regions that cover the true source locations. The second addresses the more challenging setting where the source count is unknown, first reliably estimating the number of active sources and then forming corresponding prediction regions. We evaluate the proposed methods on extensive simulations and real-world recordings across varying reverberation levels and source configurations. Results demonstrate reliable finite-sample guarantees and consistent performance for both known and unknown source-count scenarios, highlighting the practical utility of the proposed frameworks for uncertainty-aware SSL.
Paper Structure (34 sections, 2 theorems, 41 equations, 4 figures, 4 tables, 5 algorithms)

This paper contains 34 sections, 2 theorems, 41 equations, 4 figures, 4 tables, 5 algorithms.

Key Result

Theorem 1

Suppose $L_i(\lambda)$ is non-increasing in $\lambda$, right-continuous, satisfying $L_i(\mathcal{C}_{\lambda_{\max}})\le\alpha$ for $\lambda_{\mathrm{max}}=\sup \Lambda\in\Lambda$ and $\sup_\lambda L_i(\lambda)\le B<\infty$ almost surely. Then $\mathbb{E}[L_{n+1}(\hat{\lambda})]\le\alpha$.

Figures (4)

  • Figure 1: Overview of the proposed UQ frameworks. For known-source count $K$, calibration samples with $K$DOA and their likelihood maps are used, and CRC-SSL-N calibrates thresholds $(\lambda_1,\ldots,\lambda_K)$ to control individual MC risks at level $\alpha_{\mathrm{MC}}$. For unknown-source count, calibration samples across detection thresholds $\beta$ yield $\widehat{K}(\beta)$ detections and likelihood maps, and PT-SSL-U jointly calibrates $(\lambda_1,\ldots,\lambda_{K_{\max}},\beta_{\mathrm{TH}})$ to control MC and MD risks at levels $\alpha_{\mathrm{MC}}$ and $\alpha_{\mathrm{MD}}$ with high probability $1-\delta$.
  • Figure 2: 'CRC-SSL-N' prediction regions applied to SRP-DNN likelihood maps at 90% (top) and 95% (bottom) coverage, for $T_{60}=700$ ms, and SNR$=15$ dB. Blue and red Xs mark true and detected sources, respectively, ordered by detection. Brighter colors indicate higher likelihood. Regions expand adaptively along likelihood contours rather than fixed geometric shapes.
  • Figure 3: 'PT-SSL-U' results for varying calibration-set sizes with risk tolerances $\alpha_{\mathrm{MC}}=\alpha_{\mathrm{MD}}=0.1$ (dashed lines) and significance level $\delta=0.1$. Top: SRP-PHAT. Bottom: SRP-DNN.
  • Figure 4: Estimated PDFs of $\beta$ peak values for $K_{\mathrm{max}}=2$.

Theorems & Definitions (2)

  • Theorem 1: Conformal Risk Control crc
  • Theorem 2