Table of Contents
Fetching ...

Self-supervised learning for classifying paranasal anomalies in the maxillary sinus

Debayan Bhattacharya, Finn Behrendt, Benjamin Tobias Becker, Lennart Maack, Dirk Beyersdorff, Elina Petersen, Marvin Petersen, Bastian Cheng, Dennis Eggert, Christian Betz, Anna Sophie Hoffmann, Alexander Schlaefer

TL;DR

This work tackles the challenge of classifying paranasal anomalies in the maxillary sinus from 3D MRI where labeled data are scarce. It introduces a self-supervised pipeline that uses a 3D convolutional autoencoder trained on healthy MS data to generate residuals on unlabeled scans, guiding a self-supervised 3D CNN task whose encoder is then fine-tuned on labeled data for binary classification. The method explicitly optimizes anomaly localization in the SSL stage, leading to superior downstream performance, particularly in low-data settings (e.g., 10–20% labeled data), and surpasses strong SSL baselines like BYOL, SimSiam, SimCLR, and SparK. The approach demonstrates data efficiency and practical potential for clinical deployment, with code available for replication.

Abstract

Purpose: Paranasal anomalies, frequently identified in routine radiological screenings, exhibit diverse morphological characteristics. Due to the diversity of anomalies, supervised learning methods require large labelled dataset exhibiting diverse anomaly morphology. Self-supervised learning (SSL) can be used to learn representations from unlabelled data. However, there are no SSL methods designed for the downstream task of classifying paranasal anomalies in the maxillary sinus (MS). Methods: Our approach uses a 3D Convolutional Autoencoder (CAE) trained in an unsupervised anomaly detection (UAD) framework. Initially, we train the 3D CAE to reduce reconstruction errors when reconstructing normal maxillary sinus (MS) image. Then, this CAE is applied to an unlabelled dataset to generate coarse anomaly locations by creating residual MS images. Following this, a 3D Convolutional Neural Network (CNN) reconstructs these residual images, which forms our SSL task. Lastly, we fine-tune the encoder part of the 3D CNN on a labelled dataset of normal and anomalous MS images. Results: The proposed SSL technique exhibits superior performance compared to existing generic self-supervised methods, especially in scenarios with limited annotated data. When trained on just 10% of the annotated dataset, our method achieves an Area Under the Precision-Recall Curve (AUPRC) of 0.79 for the downstream classification task. This performance surpasses other methods, with BYOL attaining an AUPRC of 0.75, SimSiam at 0.74, SimCLR at 0.73 and Masked Autoencoding using SparK at 0.75. Conclusion: A self-supervised learning approach that inherently focuses on localizing paranasal anomalies proves to be advantageous, particularly when the subsequent task involves differentiating normal from anomalous maxillary sinuses. Access our code at https://github.com/mtec-tuhh/self-supervised-paranasal-anomaly

Self-supervised learning for classifying paranasal anomalies in the maxillary sinus

TL;DR

This work tackles the challenge of classifying paranasal anomalies in the maxillary sinus from 3D MRI where labeled data are scarce. It introduces a self-supervised pipeline that uses a 3D convolutional autoencoder trained on healthy MS data to generate residuals on unlabeled scans, guiding a self-supervised 3D CNN task whose encoder is then fine-tuned on labeled data for binary classification. The method explicitly optimizes anomaly localization in the SSL stage, leading to superior downstream performance, particularly in low-data settings (e.g., 10–20% labeled data), and surpasses strong SSL baselines like BYOL, SimSiam, SimCLR, and SparK. The approach demonstrates data efficiency and practical potential for clinical deployment, with code available for replication.

Abstract

Purpose: Paranasal anomalies, frequently identified in routine radiological screenings, exhibit diverse morphological characteristics. Due to the diversity of anomalies, supervised learning methods require large labelled dataset exhibiting diverse anomaly morphology. Self-supervised learning (SSL) can be used to learn representations from unlabelled data. However, there are no SSL methods designed for the downstream task of classifying paranasal anomalies in the maxillary sinus (MS). Methods: Our approach uses a 3D Convolutional Autoencoder (CAE) trained in an unsupervised anomaly detection (UAD) framework. Initially, we train the 3D CAE to reduce reconstruction errors when reconstructing normal maxillary sinus (MS) image. Then, this CAE is applied to an unlabelled dataset to generate coarse anomaly locations by creating residual MS images. Following this, a 3D Convolutional Neural Network (CNN) reconstructs these residual images, which forms our SSL task. Lastly, we fine-tune the encoder part of the 3D CNN on a labelled dataset of normal and anomalous MS images. Results: The proposed SSL technique exhibits superior performance compared to existing generic self-supervised methods, especially in scenarios with limited annotated data. When trained on just 10% of the annotated dataset, our method achieves an Area Under the Precision-Recall Curve (AUPRC) of 0.79 for the downstream classification task. This performance surpasses other methods, with BYOL attaining an AUPRC of 0.75, SimSiam at 0.74, SimCLR at 0.73 and Masked Autoencoding using SparK at 0.75. Conclusion: A self-supervised learning approach that inherently focuses on localizing paranasal anomalies proves to be advantageous, particularly when the subsequent task involves differentiating normal from anomalous maxillary sinuses. Access our code at https://github.com/mtec-tuhh/self-supervised-paranasal-anomaly
Paper Structure (13 sections, 3 figures, 3 tables)

This paper contains 13 sections, 3 figures, 3 tables.

Figures (3)

  • Figure 1: a) Extraction of MS volumes from cranial MRI b) Exemplarary coronal images of normal MS volume and MS with mucosal thickening, polyp and cyst anomaly c) Our CAE architecture. Here, k refers to kernel size, s refers to stride, p refers to padding, c refers to channel where, for example, 1/16 refers to input channel of 1 and output channel of 16. Each stage of the encoder and decoder is formed using 3D convolution followed by batch normlalisation and leaky ReLU. Upsample refers to trilinear upsampling. d) Generation of residual volume required for the self-supervision task using our CAE e) Our self-supervision task where the encoder and decoder is trained to reconstruct the residual volume f) Downstream task where the self-supervision trained encoder is trained to classify between normal and anomalous MS.
  • Figure 2: Our data processing pipeline comprises several steps: a) The labelled dataset $D_l$ b) Splitting $D_l$ into training, validation, and test subsets for downstream classification of normal versus anomalous MS. c) Normal MS samples from the labelled training set form $D_{l}^{n}$, used to train the 3D CAE $A(.)$ within the UAD framework. d) Unlabelled dataset $D_u$ e) This trained 3D CAE $A(.)$ generates residual volumes from the unlabelled dataset $D_u$ e) Unlabelled dataset of residual volumes f) The 3D CNN undergoes self-supervised training to reconstruct these residual volumes. g) The 3D CNN's encoder is initialized with weights from the SSL task, then undergoes supervised training for the final task of classifying normal versus anomalous MS, using the training set created in step a).
  • Figure 3: (LEFT) AUPRC trend vs training set percentage (RIGHT) AUROC trend vs training set percentage