ASPED: An Audio Dataset for Detecting Pedestrians

Pavan Seshadri; Chaeyeon Han; Bon-Woo Koo; Noah Posner; Subhrajit Guhathakurta; Alexander Lerch

ASPED: An Audio Dataset for Detecting Pedestrians

Pavan Seshadri, Chaeyeon Han, Bon-Woo Koo, Noah Posner, Subhrajit Guhathakurta, Alexander Lerch

TL;DR

ASPED introduces a novel audio-only pedestrian detection task and a large-scale dataset collected at Georgia Tech campuses to enable research in audio-based urban sensing. The authors benchmark three model families—VGGish, a CONV-based encoder, and the Audio Spectrogram Transformer (AST)—on 1-second frames across four proximity radii, with data-imbalance mitigation. Results indicate that audio cues can reveal pedestrian presence, with AST and CONV providing the strongest performance but still leaving room for improvement in noisy urban environments. The dataset and baselines open avenues for noise-robust models and for extending to more environments and regression-based counting.

Abstract

We introduce the new audio analysis task of pedestrian detection and present a new large-scale dataset for this task. While the preliminary results prove the viability of using audio approaches for pedestrian detection, they also show that this challenging task cannot be easily solved with standard approaches.

ASPED: An Audio Dataset for Detecting Pedestrians

TL;DR

Abstract

Paper Structure (14 sections, 2 equations, 6 figures, 1 table)

This paper contains 14 sections, 2 equations, 6 figures, 1 table.

Introduction
Related work
Dataset
Data acquisition
Annotations
Experiments
Experimental setup
Model architectures
Feature extraction
Training procedure
Hyperparameters and implementation
Experiments
Results
Conclusion

Figures (6)

Figure 1: Research team installing audio recorders in the field.
Figure 2: Pedestrian detection video setup.
Figure 3: Number of pedestrians radius $r=6m$ by hour of day.
Figure 4: Recall for each class over recording radius. Positive and negative classes are denoted by "+" and "-", respectively.
Figure 5: Macro average accuracy using the VGGISH, CONV, and AST models.
...and 1 more figures

ASPED: An Audio Dataset for Detecting Pedestrians

TL;DR

Abstract

ASPED: An Audio Dataset for Detecting Pedestrians

Authors

TL;DR

Abstract

Table of Contents

Figures (6)