Table of Contents
Fetching ...

DB3V: A Dialect Dominated Dataset of Bird Vocalisation for Cross-corpus Bird Species Recognition

Xin Jing, Luyang Zhang, Jiangjian Xie, Alexander Gebhard, Alice Baird, Bjoern Schuller

TL;DR

The study tackles cross-dialect bird species recognition by introducing D3BV, a publicly available, dialect-dominated vocalisation dataset spanning 10 species across three CONUS regions. It establishes baselines using TDNN models with four normalisation strategies, demonstrating that dialectal variation reduces cross-region performance and highlighting the importance of frequency-domain normalisation (IFN) for cross-dialect generalisation. Key contributions include a thoroughly preprocessed, openly licensed dataset (~91,752 seconds) and a set of cross-region baseline results that quantify dialect impact on recognition. The work lays groundwork for improved normalisation techniques and cross-dialect benchmarking, with potential implications for large-scale, autonomous bird monitoring and biodiversity assessments.

Abstract

In ornithology, bird species are known to have variedit's widely acknowledged that bird species display diverse dialects in their calls across different regions. Consequently, computational methods to identify bird species onsolely through their calls face critsignificalnt challenges. There is growing interest in understanding the impact of species-specific dialects on the effectiveness of bird species recognition methods. Despite potential mitigation through the expansion of dialect datasets, the absence of publicly available testing data currently impedes robust benchmarking efforts. This paper presents the Dialect Dominated Dataset of Bird Vocalisation, the first cross-corpus dataset that focuses on dialects in bird vocalisations. The DB3V comprises more than 25 hours of audio recordings from 10 bird species distributed across three distinct regions in the contiguous United States (CONUS). In addition to presenting the dataset, we conduct analyses and establish baseline models for cross-corpus bird recognition. The data and code are publicly available online: https://zenodo.org/records/11544734

DB3V: A Dialect Dominated Dataset of Bird Vocalisation for Cross-corpus Bird Species Recognition

TL;DR

The study tackles cross-dialect bird species recognition by introducing D3BV, a publicly available, dialect-dominated vocalisation dataset spanning 10 species across three CONUS regions. It establishes baselines using TDNN models with four normalisation strategies, demonstrating that dialectal variation reduces cross-region performance and highlighting the importance of frequency-domain normalisation (IFN) for cross-dialect generalisation. Key contributions include a thoroughly preprocessed, openly licensed dataset (~91,752 seconds) and a set of cross-region baseline results that quantify dialect impact on recognition. The work lays groundwork for improved normalisation techniques and cross-dialect benchmarking, with potential implications for large-scale, autonomous bird monitoring and biodiversity assessments.

Abstract

In ornithology, bird species are known to have variedit's widely acknowledged that bird species display diverse dialects in their calls across different regions. Consequently, computational methods to identify bird species onsolely through their calls face critsignificalnt challenges. There is growing interest in understanding the impact of species-specific dialects on the effectiveness of bird species recognition methods. Despite potential mitigation through the expansion of dialect datasets, the absence of publicly available testing data currently impedes robust benchmarking efforts. This paper presents the Dialect Dominated Dataset of Bird Vocalisation, the first cross-corpus dataset that focuses on dialects in bird vocalisations. The DB3V comprises more than 25 hours of audio recordings from 10 bird species distributed across three distinct regions in the contiguous United States (CONUS). In addition to presenting the dataset, we conduct analyses and establish baseline models for cross-corpus bird recognition. The data and code are publicly available online: https://zenodo.org/records/11544734
Paper Structure (10 sections, 3 figures, 3 tables)

This paper contains 10 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: A map of the contiguous United States, illustrating three distinct geographical regions (D1-D3) delineated by variations in climate and vegetation patterns. Markers indicate co ordinates for the audio samples of D3BV , with different marker colours representing different regions. The map was generated using ArcGIS.
  • Figure 2: The spectrograms of three typical bird species in different regions across the contiguous United States in the D3BV . The variations in frequency range, intervals, and other features of bird vocalisations indicate the existence of regional dialects within the same bird species.
  • Figure 3: Amount of segments of different bird species across regions with Corresponding Codes (#0 -- #9) in Table \ref{['tab:bird-info']}