Audio Atlas: Visualizing and Exploring Audio Datasets
Luca A. Lanzendörfer, Florian Grötschla, Uzeyir Valizada, Roger Wattenhofer
TL;DR
The paper tackles the difficulty of visualizing and exploring large, unlabeled audio collections. It introduces Audio Atlas, an open-source web application that combines text-audio CLAP embeddings with a 2D visualization pipeline (t-SNE) and a Milvus vector database to support scalable semantic search and exploration, including zero-shot classification. Key contributions include a modular, dataset-agnostic platform, integration of multiple audio datasets, and an interactive DeepScatter-based frontend enabling rapid navigation of clusters, neighbors, and metadata. The approach enables qualitative assessment of embedding quality, efficient similarity search via Annoy, and practical data exploration workflows for researchers working with large-scale audio data. The work has practical impact by providing a scalable, interactive tool to understand dataset structure, identify patterns and outliers, and compare embedding models in real-world collections.
Abstract
We introduce Audio Atlas, an interactive web application for visualizing audio data using text-audio embeddings. Audio Atlas is designed to facilitate the exploration and analysis of audio datasets using a contrastive embedding model and a vector database for efficient data management and semantic search. The system maps audio embeddings into a two-dimensional space and leverages DeepScatter for dynamic visualization. Designed for extensibility, Audio Atlas allows easy integration of new datasets, enabling users to better understand their audio data and identify both patterns and outliers. We open-source the codebase of Audio Atlas, and provide an initial implementation containing various audio and music datasets.
