Table of Contents
Fetching ...

Using perceptive subbands analysis to perform audio scenes cartography

Laurent Millot, Gérard Pelé, Mohammed Elliq

TL;DR

This work presents a perceptive subbands framework for audio scene cartography in stereo recordings, decomposing signals into 10 irregular subbands and computing interchannel delay and attenuation laws ($\Delta t$, $\Delta E$) under a short-time scene assumption to infer source count and incidence. It uses a non-downsampling, linear-phase FIR filter bank to produce perceptually relevant subbands and employs ISD-based energy ratios to enable re-synthesis and analysis. Through experiments on scenes with 2–4 sources, including a moving source, the approach demonstrates that subband- and histogram-based cues can reveal individual sources and approximate locations, while highlighting challenges in continuous motion estimation and source separation. The paper also discusses extensions with physical microphone models, motion libraries, neural-network identification, and plans for real-time multichannel simulators and cross-platform implementation to enhance practical audio engineering applications.

Abstract

Audio scene cartography for real or simulated stereo recordings is presented. This audio scene analysis is performed doing successively: a perceptive 10-subbands analysis, calculation of temporal laws for relative delays and gains between both channels of each subband using a short-time cons\-tant scene assumption and channels inter-correlation which permit to follow a mobile source in its moves, calculation of global and subbands histograms whose peaks give the incidence information for fixed sources. Audio scenes composed of 2 to 4 fixed sources or with a fixed source and a mobile one have been already successfully tested. Further extensions and applications will be discussed. Audio illustrations of audio scenes, subband analysis and demonstration of real-time stereo recording simulations will be given.Paper 6340 presented at the 118th Convention of the Audio Engineering Society, Barcelona, 2005

Using perceptive subbands analysis to perform audio scenes cartography

TL;DR

This work presents a perceptive subbands framework for audio scene cartography in stereo recordings, decomposing signals into 10 irregular subbands and computing interchannel delay and attenuation laws (, ) under a short-time scene assumption to infer source count and incidence. It uses a non-downsampling, linear-phase FIR filter bank to produce perceptually relevant subbands and employs ISD-based energy ratios to enable re-synthesis and analysis. Through experiments on scenes with 2–4 sources, including a moving source, the approach demonstrates that subband- and histogram-based cues can reveal individual sources and approximate locations, while highlighting challenges in continuous motion estimation and source separation. The paper also discusses extensions with physical microphone models, motion libraries, neural-network identification, and plans for real-time multichannel simulators and cross-platform implementation to enhance practical audio engineering applications.

Abstract

Audio scene cartography for real or simulated stereo recordings is presented. This audio scene analysis is performed doing successively: a perceptive 10-subbands analysis, calculation of temporal laws for relative delays and gains between both channels of each subband using a short-time cons\-tant scene assumption and channels inter-correlation which permit to follow a mobile source in its moves, calculation of global and subbands histograms whose peaks give the incidence information for fixed sources. Audio scenes composed of 2 to 4 fixed sources or with a fixed source and a mobile one have been already successfully tested. Further extensions and applications will be discussed. Audio illustrations of audio scenes, subband analysis and demonstration of real-time stereo recording simulations will be given.Paper 6340 presented at the 118th Convention of the Audio Engineering Society, Barcelona, 2005
Paper Structure (8 sections, 15 figures)

This paper contains 8 sections, 15 figures.

Figures (15)

  • Figure 1: Plots of the left and right ISD for a real audio scene recorded in a rail station.
  • Figure 2: Left and right ISD for a synthetic audio scene: static bass, organ moving on a circle around the stereophonic couple.
  • Figure 3: Global histograms for the two-sources synthetic audio scene: relative interchannel delays (left) and attenuations (right).
  • Figure 4: Temporal laws for the subband 5 (800-1200 Hz) in the case of the two-sources synthetic audio scene: relative interchannel delays (left) and attenuations (right).
  • Figure 5: Left and right ISD for a synthetic audio scene: static bass, solo guitar, guitar and banjo .
  • ...and 10 more figures