Table of Contents
Fetching ...

Masked Autoencoder Joint Learning for Robust Spitzoid Tumor Classification

Ilán Carretero, Roshni Mahtani, Silvia Perez-Deben, José Francisco González-Muñoz, Carlos Monteagudo, Valery Naranjo, Rocío del Amor

TL;DR

This work targets robust spitzoid tumor classification from DNA methylation data in the presence of missing values. It introduces ReMAC, a masked autoencoder that extends ReMasker with a Mean Aggregation Classification (MAC) head to learn missingness-invariant discriminative representations via a Transformer-based architecture. The model optimizes $\mathcal{L}_{\mathrm{ReMAC}}=\mathcal{L}_{\mathrm{REC}}+\mathcal{L}_{\mathrm{CLF}}$, combining reconstruction over observed and masked indices with a BCE classifier on mean-pooled decoder embeddings. Evaluated on 21 FFPE samples, ReMAC shows strong performance under complete and incomplete data regimes, outperforming multiple baselines, with code available at the provided repository.

Abstract

Accurate diagnosis of spitzoid tumors (ST) is critical to ensure a favorable prognosis and to avoid both under- and over-treatment. Epigenetic data, particularly DNA methylation, provide a valuable source of information for this task. However, prior studies assume complete data, an unrealistic setting as methylation profiles frequently contain missing entries due to limited coverage and experimental artifacts. Our work challenges these favorable scenarios and introduces ReMAC, an extension of ReMasker designed to tackle classification tasks on high-dimensional data under complete and incomplete regimes. Evaluation on real clinical data demonstrates that ReMAC achieves strong and robust performance compared to competing classification methods in the stratification of ST. Code is available: https://github.com/roshni-mahtani/ReMAC.

Masked Autoencoder Joint Learning for Robust Spitzoid Tumor Classification

TL;DR

This work targets robust spitzoid tumor classification from DNA methylation data in the presence of missing values. It introduces ReMAC, a masked autoencoder that extends ReMasker with a Mean Aggregation Classification (MAC) head to learn missingness-invariant discriminative representations via a Transformer-based architecture. The model optimizes , combining reconstruction over observed and masked indices with a BCE classifier on mean-pooled decoder embeddings. Evaluated on 21 FFPE samples, ReMAC shows strong performance under complete and incomplete data regimes, outperforming multiple baselines, with code available at the provided repository.

Abstract

Accurate diagnosis of spitzoid tumors (ST) is critical to ensure a favorable prognosis and to avoid both under- and over-treatment. Epigenetic data, particularly DNA methylation, provide a valuable source of information for this task. However, prior studies assume complete data, an unrealistic setting as methylation profiles frequently contain missing entries due to limited coverage and experimental artifacts. Our work challenges these favorable scenarios and introduces ReMAC, an extension of ReMasker designed to tackle classification tasks on high-dimensional data under complete and incomplete regimes. Evaluation on real clinical data demonstrates that ReMAC achieves strong and robust performance compared to competing classification methods in the stratification of ST. Code is available: https://github.com/roshni-mahtani/ReMAC.

Paper Structure

This paper contains 15 sections, 2 equations, 3 figures, 2 tables.

Figures (3)

  • Figure 1: Method Overview. In this article, we introduce ReMAC, a framework that extends the ReMasker approach by incorporating an aggregation and classification head to learn missingness-invariant discriminative representations for the stratification of spitzoid tumors into Spitz nevus and Spitz melanoma.
  • Figure 2: Comparison of different representation strategies for decoder embeddings under varying missing-value (%MV) regimes. Accuracy (ACC) is reported for ReCLS (learnable token), ReMaxAC (max pooling), and ReMAC (mean pooling, ours).
  • Figure 3: Ablation Studies on ReMAC: impact of the classification head complexity and the mask ratio (MR) under varying missing-value regimes. In both subfigures, the configuration adopted in the main experiments is highlighted in bold in the legend.