LocaGen: Sub-Sample Time-Delay Learning for Beam Localization
Ishaan Kunwar, Henry Cantor, Tyler Rizzo, Ayaan Qayyum
TL;DR
LocaGen addresses the challenge of accurate 2-D beam localization with compact microphone arrays by training models on a scalable synthetic-data framework that mimics real-world timing and noise. By combining GCC-PHAT-derived TDOAs with ML-based refinement (random forest classification and MLP regression) and an audio denoiser, it significantly reduces quantization-induced errors and enables accurate DOA and position estimates on low-power hardware. The approach is validated through synthetic experiments showing substantial gains at high sampling rates (e.g., MAE ~2.87° at 48 kHz) and a real-world pipeline using 3 mics with 10 cm spacing, suggesting strong potential for embedded SAR drone applications. The work demonstrates how synthetic data and sub-sample delay learning can extend the capabilities of small arrays, enabling lighter hardware without sacrificing localization performance.
Abstract
The goal of LocaGen is to improve the localization performance of audio signals in the 2-D beam localization problem. LocaGen reduces sampling quantization errors through machine learning models trained on realistic synthetic data generated by a simulation. The system increases the accuracy of both direction-of-arrival (DOA) and precise location estimation of an audio beam from an array of three microphones. We demonstrate LocaGen's efficacy on a low-powered embedded system with an increased localization accuracy with a minimal increase in real-time resource usage. LocaGen was demonstrated to reduce DOA error by approximately 67% even with a microphone array of only 10 kHz in audio processing.
