SLAM-based Joint Calibration of Multiple Asynchronous Microphone Arrays and Sound Source Localization
Jiang Wang, Yuanzheng He, Daobilige Su, Katsutoshi Itoyama, Kazuhiro Nakadai, Junfeng Wu, Shoudong Huang, Youfu Li, He Kong
TL;DR
This work tackles the challenge of calibrating multiple asynchronous microphone arrays for 3D sound source localization by framing the problem as batch SLAM. It develops a Fisher information-based observability analysis to derive identifiability conditions for array poses, time offsets, clock differences, and source trajectories, and introduces a robust initialization procedure to feed into Gauss–Newton batch optimization. Through extensive simulations and real-world experiments, the proposed pipeline demonstrates higher accuracy and faster convergence than methods relying on ground-truth initial values or existing calibration frameworks. The approach reduces reliance on hardware synchronization, enabling scalable, precise robot-audition systems in 3D environments and across varied scene scales.
Abstract
Robot audition systems with multiple microphone arrays have many applications in practice. However, accurate calibration of multiple microphone arrays remains challenging because there are many unknown parameters to be identified, including the relative transforms (i.e., orientation, translation) and asynchronous factors (i.e., initial time offset and sampling clock difference) between microphone arrays. To tackle these challenges, in this paper, we adopt batch simultaneous localization and mapping (SLAM) for joint calibration of multiple asynchronous microphone arrays and sound source localization. Using the Fisher information matrix (FIM) approach, we first conduct the observability analysis (i.e., parameter identifiability) of the above-mentioned calibration problem and establish necessary/sufficient conditions under which the FIM and the Jacobian matrix have full column rank, which implies the identifiability of the unknown parameters. We also discover several scenarios where the unknown parameters are not uniquely identifiable. Subsequently, we propose an effective framework to initialize the unknown parameters, which is used as the initial guess in batch SLAM for multiple microphone arrays calibration, aiming to further enhance optimization accuracy and convergence. Extensive numerical simulations and real experiments have been conducted to verify the performance of the proposed method. The experiment results show that the proposed pipeline achieves higher accuracy with fast convergence in comparison to methods that use the noise-corrupted ground truth of the unknown parameters as the initial guess in the optimization and other existing frameworks.
