Table of Contents
Fetching ...

ASPED: An Audio Dataset for Detecting Pedestrians

Pavan Seshadri, Chaeyeon Han, Bon-Woo Koo, Noah Posner, Subhrajit Guhathakurta, Alexander Lerch

TL;DR

ASPED introduces a novel audio-only pedestrian detection task and a large-scale dataset collected at Georgia Tech campuses to enable research in audio-based urban sensing. The authors benchmark three model families—VGGish, a CONV-based encoder, and the Audio Spectrogram Transformer (AST)—on 1-second frames across four proximity radii, with data-imbalance mitigation. Results indicate that audio cues can reveal pedestrian presence, with AST and CONV providing the strongest performance but still leaving room for improvement in noisy urban environments. The dataset and baselines open avenues for noise-robust models and for extending to more environments and regression-based counting.

Abstract

We introduce the new audio analysis task of pedestrian detection and present a new large-scale dataset for this task. While the preliminary results prove the viability of using audio approaches for pedestrian detection, they also show that this challenging task cannot be easily solved with standard approaches.

ASPED: An Audio Dataset for Detecting Pedestrians

TL;DR

ASPED introduces a novel audio-only pedestrian detection task and a large-scale dataset collected at Georgia Tech campuses to enable research in audio-based urban sensing. The authors benchmark three model families—VGGish, a CONV-based encoder, and the Audio Spectrogram Transformer (AST)—on 1-second frames across four proximity radii, with data-imbalance mitigation. Results indicate that audio cues can reveal pedestrian presence, with AST and CONV providing the strongest performance but still leaving room for improvement in noisy urban environments. The dataset and baselines open avenues for noise-robust models and for extending to more environments and regression-based counting.

Abstract

We introduce the new audio analysis task of pedestrian detection and present a new large-scale dataset for this task. While the preliminary results prove the viability of using audio approaches for pedestrian detection, they also show that this challenging task cannot be easily solved with standard approaches.
Paper Structure (14 sections, 2 equations, 6 figures, 1 table)

This paper contains 14 sections, 2 equations, 6 figures, 1 table.

Figures (6)

  • Figure 1: Research team installing audio recorders in the field.
  • Figure 2: Pedestrian detection video setup.
  • Figure 3: Number of pedestrians radius $r=6m$ by hour of day.
  • Figure 4: Recall for each class over recording radius. Positive and negative classes are denoted by "+" and "-", respectively.
  • Figure 5: Macro average accuracy using the VGGISH, CONV, and AST models.
  • ...and 1 more figures