Table of Contents
Fetching ...

Definition-independent Formalization of Soundscapes: Towards a Formal Methodology

Mikel D. Jedrusiak, Thomas Harweg, Timo Haselhoff, Bryce T. Lawrence, Susanne Moebus, Frank Weichert

TL;DR

This work tackles the fragmentation of soundscape research by introducing a definition-independent formalization, encoding a soundscape as $\boldsymbol{x} = (\mathcal{G}, \mathcal{A}, \mathcal{S}, \mathcal{T})$ to integrate geospatial data, audio signals, isolated sources, and temporal structure. It demonstrates the approach through an empirical SALVE-based study that derives frequency correlation matrices (FCMs) from daily recordings, then uses a two-pronged analysis: (i) a $\beta$-VAE to cluster FCM embeddings and (ii) a 3D CNN that classifies seven-day sequences of FCMs into land-use categories. The results show that FCMs can compactly represent urban acoustic environments and support definition-agnostic classification, though device-level variability and data size limit generalization. Overall, the formalization provides a common, interoperable basis for cross-disciplinary soundscape analysis and paves the way for broader validation and extension across domains and definitions.

Abstract

Soundscapes have been studied by researchers from various disciplines, each with different perspectives, goals, approaches, and terminologies. Accordingly, depending on the field, the concept of a soundscape's components changes, consequently changing the basic definition. This results in complicating interdisciplinary communication and comparison of results. Especially when soundscape-unrelated research areas are involved. For this reason, we present a potential formalization that is independent of the underlying soundscape definition, with the goal of being able to capture the heterogeneous structure of the data as well as the different ideologies in one model. In an exemplary analysis of frequency correlation matrices for land use type detection as an alternative to features like MFCCs, we show a practical application of our presented formalization.

Definition-independent Formalization of Soundscapes: Towards a Formal Methodology

TL;DR

This work tackles the fragmentation of soundscape research by introducing a definition-independent formalization, encoding a soundscape as to integrate geospatial data, audio signals, isolated sources, and temporal structure. It demonstrates the approach through an empirical SALVE-based study that derives frequency correlation matrices (FCMs) from daily recordings, then uses a two-pronged analysis: (i) a -VAE to cluster FCM embeddings and (ii) a 3D CNN that classifies seven-day sequences of FCMs into land-use categories. The results show that FCMs can compactly represent urban acoustic environments and support definition-agnostic classification, though device-level variability and data size limit generalization. Overall, the formalization provides a common, interoperable basis for cross-disciplinary soundscape analysis and paves the way for broader validation and extension across domains and definitions.

Abstract

Soundscapes have been studied by researchers from various disciplines, each with different perspectives, goals, approaches, and terminologies. Accordingly, depending on the field, the concept of a soundscape's components changes, consequently changing the basic definition. This results in complicating interdisciplinary communication and comparison of results. Especially when soundscape-unrelated research areas are involved. For this reason, we present a potential formalization that is independent of the underlying soundscape definition, with the goal of being able to capture the heterogeneous structure of the data as well as the different ideologies in one model. In an exemplary analysis of frequency correlation matrices for land use type detection as an alternative to features like MFCCs, we show a practical application of our presented formalization.
Paper Structure (10 sections, 10 equations, 5 figures, 1 table)

This paper contains 10 sections, 10 equations, 5 figures, 1 table.

Figures (5)

  • Figure 1: The basic idea of our formalization of a time-dependent $\mathcal{T}$ layered model with sound sources $\mathcal{S}$ and geodata $\mathcal{G}$.
  • Figure 2: Frequency correlation matrices based on all recordings of one day depending on their location. The scale used is the squared pearson coefficient.
  • Figure 3: Network architecture of the convolutional variational autoencoder (VAE). Single FCMs are used as input and target.
  • Figure 4: Our clustering result. Each point in a cluster corresponds to a FCM encoded in the VAE, with the color uniquely assigned to each recording device used.
  • Figure 5: Our architecture with a temporally sorted input of frequency correlation matrices for land use type classification.