Supervised Contrastive Machine Unlearning of Background Bias in Sonar Image Classification with Fine-Grained Explainable AI
Kamal Basha S, Athira Nambiar
TL;DR
The paper tackles background bias in sonar image classification caused by seafloor cues. It introduces Targeted Contrastive Unlearning (TCU) to explicitly forget seafloor features and Unlearn-to-Explain Sonar Framework (UESF) to quantify and visualize forgetting with differential LIME explanations. Experiments on real and synthetic sonar data show maintained accuracy with improved target recall and reduced background reliance, along with bias-aware interpretability. This work advances robust, explainable sonar AI by coupling targeted unlearning with bias-focused explanations, enabling more reliable deployments in underwater object detection.
Abstract
Acoustic sonar image analysis plays a critical role in object detection and classification, with applications in both civilian and defense domains. Despite the availability of real and synthetic datasets, existing AI models that achieve high accuracy often over-rely on seafloor features, leading to poor generalization. To mitigate this issue, we propose a novel framework that integrates two key modules: (i) a Targeted Contrastive Unlearning (TCU) module, which extends the traditional triplet loss to reduce seafloor-induced background bias and improve generalization, and (ii) the Unlearn to Explain Sonar Framework (UESF), which provides visual insights into what the model has deliberately forgotten while adapting the LIME explainer to generate more faithful and localized attributions for unlearning evaluation. Extensive experiments across both real and synthetic sonar datasets validate our approach, demonstrating significant improvements in unlearning effectiveness, model robustness, and interpretability.
