Table of Contents
Fetching ...

Neural Edge Histogram Descriptors for Underwater Acoustic Target Recognition

Atharva Agashe, Davelle Carreiro, Alexandra Van Dine, Joshua Peeples

TL;DR

Problem: domain shifts and computational limits hinder deploying large pretrained models for underwater acoustic target recognition. Approach: adapt Neural Edge Histogram Descriptors to spectrograms, combining a structural edge-descriptor path and a statistical histogram path via $f(X) = phi( sum_{rho in N} psi(x_rho) )$ to extract texture features. Key findings: on the DeepShip dataset NEHD achieves $65.80\%$ accuracy with about $1.36\times 10^4$ parameters, competitive with ResNet-50 ($2.35\times 10^7$) and ViT ($2.15\times 10^7$), and significantly more efficient than PANN and AST. Significance: demonstrates that a lightweight texture-focused descriptor can match large models and can serve as a feature extractor to boost other networks for resource-constrained underwater sensing.

Abstract

Numerous maritime applications rely on the ability to recognize acoustic targets using passive sonar. While there is a growing reliance on pre-trained models for classification tasks, these models often require extensive computational resources and may not perform optimally when transferred to new domains due to dataset variations. To address these challenges, this work adapts the neural edge histogram descriptors (NEHD) method originally developed for image classification, to classify passive sonar signals. We conduct a comprehensive evaluation of statistical and structural texture features, demonstrating that their combination achieves competitive performance with large pre-trained models. The proposed NEHD-based approach offers a lightweight and efficient solution for underwater target recognition, significantly reducing computational costs while maintaining accuracy.

Neural Edge Histogram Descriptors for Underwater Acoustic Target Recognition

TL;DR

Problem: domain shifts and computational limits hinder deploying large pretrained models for underwater acoustic target recognition. Approach: adapt Neural Edge Histogram Descriptors to spectrograms, combining a structural edge-descriptor path and a statistical histogram path via to extract texture features. Key findings: on the DeepShip dataset NEHD achieves accuracy with about parameters, competitive with ResNet-50 () and ViT (), and significantly more efficient than PANN and AST. Significance: demonstrates that a lightweight texture-focused descriptor can match large models and can serve as a feature extractor to boost other networks for resource-constrained underwater sensing.

Abstract

Numerous maritime applications rely on the ability to recognize acoustic targets using passive sonar. While there is a growing reliance on pre-trained models for classification tasks, these models often require extensive computational resources and may not perform optimally when transferred to new domains due to dataset variations. To address these challenges, this work adapts the neural edge histogram descriptors (NEHD) method originally developed for image classification, to classify passive sonar signals. We conduct a comprehensive evaluation of statistical and structural texture features, demonstrating that their combination achieves competitive performance with large pre-trained models. The proposed NEHD-based approach offers a lightweight and efficient solution for underwater target recognition, significantly reducing computational costs while maintaining accuracy.

Paper Structure

This paper contains 15 sections, 3 equations, 5 figures, 3 tables.

Figures (5)

  • Figure 1: The workflow of the proposed model for passive sonar audio classification is shown in (a). The short-time Fourier transform (STFT) is applied to the input audio signal. The neural edge histogram descriptor (NEHD) module is then used to extract structural and statistical texture features. The resulting features are then used for vessel classification.
  • Figure 2: NEHD model architecture is shown. The input spectrogram first has structural and statistical features extracted through the edge descriptors and histogram layers respectively. These features are then used for classification.
  • Figure 3: The heatmaps illustrate the exhaustive feature search process used to determine the optimal window length, hop length, and frequency bin parameters for the STFT feature. Each heatmap corresponds to a specific number of frequency bins: (a) 48, (b) 96, and (c) 192. The Y-axis represents the hop length, while the X-axis shows the window length. Darker shades indicate higher accuracy.
  • Figure 4: Block diagrams illustrating individual models used to evaluate the effect of statistical vs. structural texture features.
  • Figure 5: The average confusion matrices for (a) Linear Classifier (Baseline), (b) Edge Descriptors (Structural), (c) Histogram (Statistical) and (d) NEHD (Both Structural and Statistical) are shown for the DeepShip dataset. Darker shades indicate higher average accuracy. The predicted classes are shown along rows and the true labels are shown along the columns. NEHD improves the identification of each class in comparison to other models.