Masked Autoencoder Joint Learning for Robust Spitzoid Tumor Classification
Ilán Carretero, Roshni Mahtani, Silvia Perez-Deben, José Francisco González-Muñoz, Carlos Monteagudo, Valery Naranjo, Rocío del Amor
TL;DR
This work targets robust spitzoid tumor classification from DNA methylation data in the presence of missing values. It introduces ReMAC, a masked autoencoder that extends ReMasker with a Mean Aggregation Classification (MAC) head to learn missingness-invariant discriminative representations via a Transformer-based architecture. The model optimizes $\mathcal{L}_{\mathrm{ReMAC}}=\mathcal{L}_{\mathrm{REC}}+\mathcal{L}_{\mathrm{CLF}}$, combining reconstruction over observed and masked indices with a BCE classifier on mean-pooled decoder embeddings. Evaluated on 21 FFPE samples, ReMAC shows strong performance under complete and incomplete data regimes, outperforming multiple baselines, with code available at the provided repository.
Abstract
Accurate diagnosis of spitzoid tumors (ST) is critical to ensure a favorable prognosis and to avoid both under- and over-treatment. Epigenetic data, particularly DNA methylation, provide a valuable source of information for this task. However, prior studies assume complete data, an unrealistic setting as methylation profiles frequently contain missing entries due to limited coverage and experimental artifacts. Our work challenges these favorable scenarios and introduces ReMAC, an extension of ReMasker designed to tackle classification tasks on high-dimensional data under complete and incomplete regimes. Evaluation on real clinical data demonstrates that ReMAC achieves strong and robust performance compared to competing classification methods in the stratification of ST. Code is available: https://github.com/roshni-mahtani/ReMAC.
