Table of Contents
Fetching ...

Cross-Domain Knowledge Transfer for Underwater Acoustic Classification Using Pre-trained Models

Amirmohammad Mohammadi, Tejashri Kelhe, Davelle Carreiro, Alexandra Van Dine, Joshua Peeples

TL;DR

The paper investigates cross-domain transfer learning for underwater acoustic classification by comparing AudioSet-pretrained PANNs with ImageNet-pretrained TIMMs on the DeepShip passive sonar dataset. Using a consistent preprocessing and augmentation pipeline, it demonstrates that ImageNet-pretrained TIMMs can surpass audio-pretrained models, particularly at various sampling rates, and that data resolution interacts with pre-training to influence performance. Through extensive experiments and interpretability analyses (Grad-CAM), the work highlights the practical potential of cross-domain transfer learning to address data scarcity in UATR and suggests directions for incorporating self-supervised and multi-modal approaches. The findings have implications for deploying efficient, robust underwater classifiers in real-world maritime applications where labeled data are limited.

Abstract

Transfer learning is commonly employed to leverage large, pre-trained models and perform fine-tuning for downstream tasks. The most prevalent pre-trained models are initially trained using ImageNet. However, their ability to generalize can vary across different data modalities. This study compares pre-trained Audio Neural Networks (PANNs) and ImageNet pre-trained models within the context of underwater acoustic target recognition (UATR). It was observed that the ImageNet pre-trained models slightly out-perform pre-trained audio models in passive sonar classification. We also analyzed the impact of audio sampling rates for model pre-training and fine-tuning. This study contributes to transfer learning applications of UATR, illustrating the potential of pre-trained models to address limitations caused by scarce, labeled data in the UATR domain.

Cross-Domain Knowledge Transfer for Underwater Acoustic Classification Using Pre-trained Models

TL;DR

The paper investigates cross-domain transfer learning for underwater acoustic classification by comparing AudioSet-pretrained PANNs with ImageNet-pretrained TIMMs on the DeepShip passive sonar dataset. Using a consistent preprocessing and augmentation pipeline, it demonstrates that ImageNet-pretrained TIMMs can surpass audio-pretrained models, particularly at various sampling rates, and that data resolution interacts with pre-training to influence performance. Through extensive experiments and interpretability analyses (Grad-CAM), the work highlights the practical potential of cross-domain transfer learning to address data scarcity in UATR and suggests directions for incorporating self-supervised and multi-modal approaches. The findings have implications for deploying efficient, robust underwater classifiers in real-world maritime applications where labeled data are limited.

Abstract

Transfer learning is commonly employed to leverage large, pre-trained models and perform fine-tuning for downstream tasks. The most prevalent pre-trained models are initially trained using ImageNet. However, their ability to generalize can vary across different data modalities. This study compares pre-trained Audio Neural Networks (PANNs) and ImageNet pre-trained models within the context of underwater acoustic target recognition (UATR). It was observed that the ImageNet pre-trained models slightly out-perform pre-trained audio models in passive sonar classification. We also analyzed the impact of audio sampling rates for model pre-training and fine-tuning. This study contributes to transfer learning applications of UATR, illustrating the potential of pre-trained models to address limitations caused by scarce, labeled data in the UATR domain.
Paper Structure (11 sections, 1 equation, 4 figures, 2 tables)

This paper contains 11 sections, 1 equation, 4 figures, 2 tables.

Figures (4)

  • Figure 1: The overall framework is shown. (a) Data Preprocessing: The audio waveform is transformed into a logarithmic mel-frequency spectrogram. (b) Data Augmentation: SpectAugmentation park2019specaugment and Mixup zhang2018mixup are added during training to improve model robustness. (c) Data Analysis: Input spectrograms are processed using networks pre-trained on AudioSet (PANN kong2020panns) or ImageNet (TIMM rw2019timm).
  • Figure 2: Average test accuracy across three experimental runs at different sampling rates with $\pm1$ standard deviation for CNN14 models. Each color represents a different model, with the symbol of each model centered on the average test accuracy and the error bars show $\pm1$ standard deviation.
  • Figure 3: Average confusion matrices for best PANN model (a) CNN14-32k (70.6 $\pm$ 0.8) and best TIMM model (b) ConvNeXtV2-tiny (73.7 $\pm$ 0.8) across three experimental runs. The average test accuracy $\pm1$ standard deviation is shown in parentheses.
  • Figure 4: Visualization of Grad-CAM results for (a) CNN14 correctly classified samples, (b) CNN14 misclassified samples, (c) ConvNeXtV2-tiny correctly classified samples, and (d) ConvNeXtV2-tiny misclassified samples.