Table of Contents
Fetching ...

Seeing Beyond Sound: Visualization and Abstraction in Audio Data Representation

Ashlae Blum'e

TL;DR

This paper addresses the misalignment between human auditory perception and traditional 2D audio visualizations, along with the opacity of legacy software. It proposes design principles—transparency, flexibility, and robustness—grounded in cognitive load theory and visual design, and demonstrates these ideas with Jellyfish Dynamite, a Python-based, interactive tool offering multiple spectral transforms and MVC architecture. By enabling simultaneous, multisensory representations and user-driven exploration, the work aims to improve pattern recognition, foster inclusivity (including citizen science), and support collaborative workflows in audio information research. The practical impact is a more adaptable, transparent, and engaging framework for analyzing complex audio data across professional, educational, and public contexts.

Abstract

In audio signal processing, the interpretation of complex information using visual representation enhances pattern recognition through its alignment with human perceptual systems. Software tools that carry hidden assumptions inherited from their historical contexts risk misalignment with modern workflows as design origins become obscured. We argue that creating tools that align with emergent needs improves analytical and creative outputs due to an increased affinity for using them. This paper explores the potentials associated with adding dimensionality and interactivity into visualization tools to facilitate complex workflows in audio information research using the Jellyfish Dynamite software.

Seeing Beyond Sound: Visualization and Abstraction in Audio Data Representation

TL;DR

This paper addresses the misalignment between human auditory perception and traditional 2D audio visualizations, along with the opacity of legacy software. It proposes design principles—transparency, flexibility, and robustness—grounded in cognitive load theory and visual design, and demonstrates these ideas with Jellyfish Dynamite, a Python-based, interactive tool offering multiple spectral transforms and MVC architecture. By enabling simultaneous, multisensory representations and user-driven exploration, the work aims to improve pattern recognition, foster inclusivity (including citizen science), and support collaborative workflows in audio information research. The practical impact is a more adaptable, transparent, and engaging framework for analyzing complex audio data across professional, educational, and public contexts.

Abstract

In audio signal processing, the interpretation of complex information using visual representation enhances pattern recognition through its alignment with human perceptual systems. Software tools that carry hidden assumptions inherited from their historical contexts risk misalignment with modern workflows as design origins become obscured. We argue that creating tools that align with emergent needs improves analytical and creative outputs due to an increased affinity for using them. This paper explores the potentials associated with adding dimensionality and interactivity into visualization tools to facilitate complex workflows in audio information research using the Jellyfish Dynamite software.

Paper Structure

This paper contains 40 sections, 3 figures, 1 table.

Figures (3)

  • Figure 1: Comparison of power spectral density transformations using FFT, CQT, wavelet, chirplet, and multi-resolution methods, displayed horizontally, for a time-sequence of audio syllables, displayed vertically.
  • Figure 2: Jellyfish Dynamite Interface. Plot shows an audio power spectrum with spectrogram overlays, peak connections, and energy tracking lines. Interface controls contain buttons, switches, and instructions for use. Data tables contain ready-to-export peak selections.
  • Figure 3: (Left) Full comparison of dual-scale spectrogram selections visualizes every possible combination for nfft values of 512, 1024, 2048 and hop length of 2, 4, 8. (Right) Fully-connected peak plot showing PSD, spectrogram, and energy ridges for a single audio file from Jellyfish Dynamite's interactive interface.