Using perceptive subbands analysis to perform audio scenes cartography
Laurent Millot, Gérard Pelé, Mohammed Elliq
TL;DR
This work presents a perceptive subbands framework for audio scene cartography in stereo recordings, decomposing signals into 10 irregular subbands and computing interchannel delay and attenuation laws ($\Delta t$, $\Delta E$) under a short-time scene assumption to infer source count and incidence. It uses a non-downsampling, linear-phase FIR filter bank to produce perceptually relevant subbands and employs ISD-based energy ratios to enable re-synthesis and analysis. Through experiments on scenes with 2–4 sources, including a moving source, the approach demonstrates that subband- and histogram-based cues can reveal individual sources and approximate locations, while highlighting challenges in continuous motion estimation and source separation. The paper also discusses extensions with physical microphone models, motion libraries, neural-network identification, and plans for real-time multichannel simulators and cross-platform implementation to enhance practical audio engineering applications.
Abstract
Audio scene cartography for real or simulated stereo recordings is presented. This audio scene analysis is performed doing successively: a perceptive 10-subbands analysis, calculation of temporal laws for relative delays and gains between both channels of each subband using a short-time cons\-tant scene assumption and channels inter-correlation which permit to follow a mobile source in its moves, calculation of global and subbands histograms whose peaks give the incidence information for fixed sources. Audio scenes composed of 2 to 4 fixed sources or with a fixed source and a mobile one have been already successfully tested. Further extensions and applications will be discussed. Audio illustrations of audio scenes, subband analysis and demonstration of real-time stereo recording simulations will be given.Paper 6340 presented at the 118th Convention of the Audio Engineering Society, Barcelona, 2005
