Table of Contents
Fetching ...

Detection and Classification of Cetacean Echolocation Clicks using Image-based Object Detection Methods applied to Advanced Wavelet-based Transformations

Christopher Hauer

TL;DR

This thesis shows the efficacy of CLICK-SPOT on Norwegian Killer whale underwater recordings provided by the cetacean biologist Dr. Vester.

Abstract

A challenge in marine bioacoustic analysis is the detection of animal signals, like calls, whistles and clicks, for behavioral studies. Manual labeling is too time-consuming to process sufficient data to get reasonable results. Thus, an automatic solution to overcome the time-consuming data analysis is necessary. Basic mathematical models can detect events in simple environments, but they struggle with complex scenarios, like differentiating signals with a low signal-to-noise ratio or distinguishing clicks from echoes. Deep Learning Neural Networks, such as ANIMAL-SPOT, are better suited for such tasks. DNNs process audio signals as image representations, often using spectrograms created by Short-Time Fourier Transform. However, spectrograms have limitations due to the uncertainty principle, which creates a tradeoff between time and frequency resolution. Alternatives like the wavelet, which provides better time resolution for high frequencies and improved frequency resolution for low frequencies, may offer advantages for feature extraction in complex bioacoustic environments. This thesis shows the efficacy of CLICK-SPOT on Norwegian Killer whale underwater recordings provided by the cetacean biologist Dr. Vester. Keywords: Bioacoustics, Deep Learning, Wavelet Transformation

Detection and Classification of Cetacean Echolocation Clicks using Image-based Object Detection Methods applied to Advanced Wavelet-based Transformations

TL;DR

This thesis shows the efficacy of CLICK-SPOT on Norwegian Killer whale underwater recordings provided by the cetacean biologist Dr. Vester.

Abstract

A challenge in marine bioacoustic analysis is the detection of animal signals, like calls, whistles and clicks, for behavioral studies. Manual labeling is too time-consuming to process sufficient data to get reasonable results. Thus, an automatic solution to overcome the time-consuming data analysis is necessary. Basic mathematical models can detect events in simple environments, but they struggle with complex scenarios, like differentiating signals with a low signal-to-noise ratio or distinguishing clicks from echoes. Deep Learning Neural Networks, such as ANIMAL-SPOT, are better suited for such tasks. DNNs process audio signals as image representations, often using spectrograms created by Short-Time Fourier Transform. However, spectrograms have limitations due to the uncertainty principle, which creates a tradeoff between time and frequency resolution. Alternatives like the wavelet, which provides better time resolution for high frequencies and improved frequency resolution for low frequencies, may offer advantages for feature extraction in complex bioacoustic environments. This thesis shows the efficacy of CLICK-SPOT on Norwegian Killer whale underwater recordings provided by the cetacean biologist Dr. Vester. Keywords: Bioacoustics, Deep Learning, Wavelet Transformation
Paper Structure (56 sections, 1 equation, 32 figures, 12 tables)

This paper contains 56 sections, 1 equation, 32 figures, 12 tables.

Figures (32)

  • Figure 1: Six example images of the waveform and spectrogram of different clicks and echoes in isolation. The differing SNR values can happen due to the distance of the emitted click to the hydrophone and the directionality of clicks. The spectrogram was generated with a segment size of 16 samples and a hop of 8 samples. The number of samples are very low when compared to the more commonly used 1024 segment size and 512 hop. This is necessary due to the small window size of only 384 samples (2 milliseconds using 192kHz). This explanation also holds for other pictures.
  • Figure 2: Three example images of click and echo pairs. Image (a) shows a usual click and echo pair, with a positive starting amplitude click and a negative starting amplitude echo. Due to a weak initial phase, the echo in image (b) appears to have a positive starting amplitude. In Image (c), due to the strong impulse compared to the initial amplitude, it looks like the click starts with a negative amplitude and echo with a positive amplitude.
  • Figure 3: An example of the waveform, spectrogram and labels in Audacity. The equidistant click echo pairs marked within green boxes are part of a burst. The click and echo pairs marked with orange boxes are also easy to distinguish, but not part of the same burst as the ones marked with green. The events marked with red boxes are labeled, but difficult to identify and potentially wrong.
  • Figure 4: Example of a low-frequency click [image (a)], a high-frequency click [image (b)] and an ultrasonic-frequency click [image (c)].
  • Figure 5: Examples of a reverberated echo in image (a) and a frequency shifted reverberated echo in image (b). The echo in image (b) has a higher peak and average energy than the click.
  • ...and 27 more figures