Self-Guided Masked Autoencoders for Domain-Agnostic Self-Supervised Learning
Johnathan Xie, Yoonho Lee, Annie S. Chen, Chelsea Finn
TL;DR
Self-Guided Masked Autoencoders (SMA) introduce a fully domain-agnostic masked modeling approach that learns masking policies from the model's own attention maps, removing the need for domain-specific tokenizers or priors. By applying a masked prediction objective with masks derived from cross- or self-attention, SMA reconstructs masked raw inputs using a single masked model, demonstrating strong representations across protein biology, chemistry, and particle physics. The method shows state-of-the-art performance relative to domain-specific masks on NLP, image, and scientific datasets, suggesting that valuable structure can be discovered purely from unlabeled data. Overall, SMA offers a broadly applicable path for unsupervised representation learning without hand-crafted priors, leveraging attention to induce meaningful masking and robust downstream transfer.
Abstract
Self-supervised learning excels in learning representations from large amounts of unlabeled data, demonstrating success across multiple data modalities. Yet, extending self-supervised learning to new modalities is non-trivial because the specifics of existing methods are tailored to each domain, such as domain-specific augmentations which reflect the invariances in the target task. While masked modeling is promising as a domain-agnostic framework for self-supervised learning because it does not rely on input augmentations, its mask sampling procedure remains domain-specific. We present Self-guided Masked Autoencoders (SMA), a fully domain-agnostic masked modeling method. SMA trains an attention based model using a masked modeling objective, by learning masks to sample without any domain-specific assumptions. We evaluate SMA on three self-supervised learning benchmarks in protein biology, chemical property prediction, and particle physics. We find SMA is capable of learning representations without domain-specific knowledge and achieves state-of-the-art performance on these three benchmarks.
