Table of Contents
Fetching ...

SLAM-based Joint Calibration of Multiple Asynchronous Microphone Arrays and Sound Source Localization

Jiang Wang, Yuanzheng He, Daobilige Su, Katsutoshi Itoyama, Kazuhiro Nakadai, Junfeng Wu, Shoudong Huang, Youfu Li, He Kong

TL;DR

This work tackles the challenge of calibrating multiple asynchronous microphone arrays for 3D sound source localization by framing the problem as batch SLAM. It develops a Fisher information-based observability analysis to derive identifiability conditions for array poses, time offsets, clock differences, and source trajectories, and introduces a robust initialization procedure to feed into Gauss–Newton batch optimization. Through extensive simulations and real-world experiments, the proposed pipeline demonstrates higher accuracy and faster convergence than methods relying on ground-truth initial values or existing calibration frameworks. The approach reduces reliance on hardware synchronization, enabling scalable, precise robot-audition systems in 3D environments and across varied scene scales.

Abstract

Robot audition systems with multiple microphone arrays have many applications in practice. However, accurate calibration of multiple microphone arrays remains challenging because there are many unknown parameters to be identified, including the relative transforms (i.e., orientation, translation) and asynchronous factors (i.e., initial time offset and sampling clock difference) between microphone arrays. To tackle these challenges, in this paper, we adopt batch simultaneous localization and mapping (SLAM) for joint calibration of multiple asynchronous microphone arrays and sound source localization. Using the Fisher information matrix (FIM) approach, we first conduct the observability analysis (i.e., parameter identifiability) of the above-mentioned calibration problem and establish necessary/sufficient conditions under which the FIM and the Jacobian matrix have full column rank, which implies the identifiability of the unknown parameters. We also discover several scenarios where the unknown parameters are not uniquely identifiable. Subsequently, we propose an effective framework to initialize the unknown parameters, which is used as the initial guess in batch SLAM for multiple microphone arrays calibration, aiming to further enhance optimization accuracy and convergence. Extensive numerical simulations and real experiments have been conducted to verify the performance of the proposed method. The experiment results show that the proposed pipeline achieves higher accuracy with fast convergence in comparison to methods that use the noise-corrupted ground truth of the unknown parameters as the initial guess in the optimization and other existing frameworks.

SLAM-based Joint Calibration of Multiple Asynchronous Microphone Arrays and Sound Source Localization

TL;DR

This work tackles the challenge of calibrating multiple asynchronous microphone arrays for 3D sound source localization by framing the problem as batch SLAM. It develops a Fisher information-based observability analysis to derive identifiability conditions for array poses, time offsets, clock differences, and source trajectories, and introduces a robust initialization procedure to feed into Gauss–Newton batch optimization. Through extensive simulations and real-world experiments, the proposed pipeline demonstrates higher accuracy and faster convergence than methods relying on ground-truth initial values or existing calibration frameworks. The approach reduces reliance on hardware synchronization, enabling scalable, precise robot-audition systems in 3D environments and across varied scene scales.

Abstract

Robot audition systems with multiple microphone arrays have many applications in practice. However, accurate calibration of multiple microphone arrays remains challenging because there are many unknown parameters to be identified, including the relative transforms (i.e., orientation, translation) and asynchronous factors (i.e., initial time offset and sampling clock difference) between microphone arrays. To tackle these challenges, in this paper, we adopt batch simultaneous localization and mapping (SLAM) for joint calibration of multiple asynchronous microphone arrays and sound source localization. Using the Fisher information matrix (FIM) approach, we first conduct the observability analysis (i.e., parameter identifiability) of the above-mentioned calibration problem and establish necessary/sufficient conditions under which the FIM and the Jacobian matrix have full column rank, which implies the identifiability of the unknown parameters. We also discover several scenarios where the unknown parameters are not uniquely identifiable. Subsequently, we propose an effective framework to initialize the unknown parameters, which is used as the initial guess in batch SLAM for multiple microphone arrays calibration, aiming to further enhance optimization accuracy and convergence. Extensive numerical simulations and real experiments have been conducted to verify the performance of the proposed method. The experiment results show that the proposed pipeline achieves higher accuracy with fast convergence in comparison to methods that use the noise-corrupted ground truth of the unknown parameters as the initial guess in the optimization and other existing frameworks.
Paper Structure (27 sections, 5 theorems, 72 equations, 10 figures, 6 tables, 1 algorithm)

This paper contains 27 sections, 5 theorems, 72 equations, 10 figures, 6 tables, 1 algorithm.

Key Result

Theorem 1

The Jacobian matrix $\mathbf{J}$ is of full column rank if and only if the following matrix is of full column rank.

Figures (10)

  • Figure 1: Geometry of the problem setup and batch SLAM-based framework for multiple microphone arrays calibration and sound source localization.
  • Figure 2: Initialization process of unknown parameters for the microphone arrays and sound source. (a) Estimation of the initial position of the sound source by triangulation. (b) Estimation of the distances between the sound source and microphone arrays using 3D geometry. (c) Estimation of microphone arrays initial positions and orientations using ICP. (d) Estimation of inter-array initial time offset and sampling clock difference using LLS.
  • Figure 3: The scenarios for microphone array calibration and the corresponding variations in the rank of the $\mathbf{F}$ matrices. (a) The geometric relationships between the moving sound source and multiple microphone arrays in two observable cases. (b) Variation of the $\mathbf{F}$ matrix rank with the movement of the source in two observable cases. (c) The geometric relationships when the moving sound source remains co-linear or co-planar with $\left\{ \mathrm{\mathbf{x}}_{arr\_1}\right\}$. (d) Variation of the $\mathbf{F}$ matrix rank in the corresponding unobservable scenarios (e) The geometric relationships when the moving sound source remains co-linear with $\left\{ \mathrm{\mathbf{x}}_{arr\_2}\right\}$ or $\if@compatibility \mathchar"0112 {} \mathchar"0112 _{arr\_4,7}^{y}= \if@compatibility \mathchar"0119 {} \mathchar"0119 /2$. (f) Variation of the $\mathbf{F}$ matrix rank in the corresponding unobservable scenarios.
  • Figure 4: Estimation results of the preset trajectory with 5 microphone arrays and 24 sound signals. (a) The initial and the true values of microphone array positions, orientations, and sound source positions. (b) The fine-tuned and true values of microphone array positions, orientations, and sound source positions. (c) The initial, fine-tuned, and true values of microphone array time offsets and sampling clock differences between microphone arrays.
  • Figure 5: Estimation results of the random trajectory with 5 microphone arrays and 80 sound signals. (a) The initial and the true values of microphone array positions, orientations, and sound source positions. (b) The fine-tuned and true values of microphone array positions, orientations, and sound source positions. (c) The initial, fine-tuned, and true values of microphone array time offset and sampling clock differences between microphone arrays.
  • ...and 5 more figures

Theorems & Definitions (5)

  • Theorem 1
  • Theorem 2
  • Theorem 3
  • Theorem 4
  • Theorem 5