Table of Contents
Fetching ...

LocaGen: Sub-Sample Time-Delay Learning for Beam Localization

Ishaan Kunwar, Henry Cantor, Tyler Rizzo, Ayaan Qayyum

TL;DR

LocaGen addresses the challenge of accurate 2-D beam localization with compact microphone arrays by training models on a scalable synthetic-data framework that mimics real-world timing and noise. By combining GCC-PHAT-derived TDOAs with ML-based refinement (random forest classification and MLP regression) and an audio denoiser, it significantly reduces quantization-induced errors and enables accurate DOA and position estimates on low-power hardware. The approach is validated through synthetic experiments showing substantial gains at high sampling rates (e.g., MAE ~2.87° at 48 kHz) and a real-world pipeline using 3 mics with 10 cm spacing, suggesting strong potential for embedded SAR drone applications. The work demonstrates how synthetic data and sub-sample delay learning can extend the capabilities of small arrays, enabling lighter hardware without sacrificing localization performance.

Abstract

The goal of LocaGen is to improve the localization performance of audio signals in the 2-D beam localization problem. LocaGen reduces sampling quantization errors through machine learning models trained on realistic synthetic data generated by a simulation. The system increases the accuracy of both direction-of-arrival (DOA) and precise location estimation of an audio beam from an array of three microphones. We demonstrate LocaGen's efficacy on a low-powered embedded system with an increased localization accuracy with a minimal increase in real-time resource usage. LocaGen was demonstrated to reduce DOA error by approximately 67% even with a microphone array of only 10 kHz in audio processing.

LocaGen: Sub-Sample Time-Delay Learning for Beam Localization

TL;DR

LocaGen addresses the challenge of accurate 2-D beam localization with compact microphone arrays by training models on a scalable synthetic-data framework that mimics real-world timing and noise. By combining GCC-PHAT-derived TDOAs with ML-based refinement (random forest classification and MLP regression) and an audio denoiser, it significantly reduces quantization-induced errors and enables accurate DOA and position estimates on low-power hardware. The approach is validated through synthetic experiments showing substantial gains at high sampling rates (e.g., MAE ~2.87° at 48 kHz) and a real-world pipeline using 3 mics with 10 cm spacing, suggesting strong potential for embedded SAR drone applications. The work demonstrates how synthetic data and sub-sample delay learning can extend the capabilities of small arrays, enabling lighter hardware without sacrificing localization performance.

Abstract

The goal of LocaGen is to improve the localization performance of audio signals in the 2-D beam localization problem. LocaGen reduces sampling quantization errors through machine learning models trained on realistic synthetic data generated by a simulation. The system increases the accuracy of both direction-of-arrival (DOA) and precise location estimation of an audio beam from an array of three microphones. We demonstrate LocaGen's efficacy on a low-powered embedded system with an increased localization accuracy with a minimal increase in real-time resource usage. LocaGen was demonstrated to reduce DOA error by approximately 67% even with a microphone array of only 10 kHz in audio processing.

Paper Structure

This paper contains 20 sections, 9 equations, 6 figures.

Figures (6)

  • Figure 1: Localization Diagram The angular location of the source of the scream is determined by the differences in time of arrival and triangulated with three microphones.
  • Figure 2: Pipeline Flow The diagram demonstrates the sequential application of a random forest machine learning model following the GCC-PHAT calculation, in order to output an estimated DOA value.
  • Figure 3: Waveform Comparison The waveform comparison for mixed noise, target noise, and filtered noise is shown. The filtered (green) and target (dark brown) waves are nearly identical, indicating success in preserving the clean form and not over-extracting data. The model also removed the majority of drone noise from the mixed input (orange), showing that noisy data was taken out.
  • Figure 4: FFT Diagram The diagram shows that higher decibels (y-axis) of drone sounds which are removed at the higher frequencies (x-axis). The filtered product conforms to the target in terms of frequency and amplitude.
  • Figure 5: RF and Triangulation Algorithm Error Comparison This histogram compares the the error of the RF model against the error of a traditional triangulation algorithm. The distribution of absolute angle error is more concentrated around $0^\circ$ for the RF model than the algorithmic alternative, indicating great improvement with machine learning.
  • ...and 1 more figures