Structural and Statistical Audio Texture Knowledge Distillation (SSATKD) for Passive Sonar Classification
Jarin Ritu, Amirmohammad Mohammadi, Davelle Carreiro, Alexandra Van Dine, Joshua Peeples
TL;DR
The paper tackles passive sonar target classification by addressing the insufficiency of high-level knowledge alone in distillation. It introduces SSATKD, a framework that simultaneously distills low-level texture (structural via edge-aware, multi-scale decomposition) and statistical texture (via RBF-quantized co-occurrences) alongside traditional output distillation, all governed by an uncertainty-weighted loss. The method employs a Laplacian/Gaussian Pyramid-based structural module and a statistical module that uses 2D Earth Mover’s Distance to align texture distributions, achieving robust improvements on the DeepShip dataset with a lightweight student (HLTDNN) guided by strong PANN teachers. Key findings include superior performance when combining structural and distillation losses, the effectiveness of 4-level LP and 4-level RBF quantization, and favorable comparisons against several contemporary knowledge distillation methods, all while maintaining efficiency suitable for resource-constrained deployments. The framework has practical implications for real-time underwater signal classification and could extend to environmental sound recognition and bioacoustics with potential gains from self-supervised or multi-modal extensions.
Abstract
Knowledge distillation has been successfully applied to various audio tasks, but its potential in underwater passive sonar target classification remains relatively unexplored. Existing methods often focus on high-level contextual information while overlooking essential low-level audio texture features needed to capture local patterns in sonar data. To address this gap, the Structural and Statistical Audio Texture Knowledge Distillation (SSATKD) framework is proposed for passive sonar target classification. SSATKD combines high-level contextual information with low-level audio textures by utilizing an Edge Detection Module for structural texture extraction and a Statistical Knowledge Extractor Module to capture signal variability and distribution. Experimental results confirm that SSATKD improves classification accuracy while optimizing memory and computational resources, making it well-suited for resource-constrained environments.
