Self-Masking Networks for Unsupervised Adaptation
Alfonso Taboada Warmerdam, Mathilde Caron, Yuki M. Asano
TL;DR
The paper tackles the challenge of adapting large pretrained vision models to downstream tasks when labeled data is scarce, while also minimizing storage by learning binary subnet masks. It introduces Self-Masking Networks (SMNs) that learn masks M over weights using a self-supervised loss, with scores S, a threshold μ, and a normalization α to keep variance, expressed as M_i = I[S_i > μ], α = √(1/N Σ I[S_i > μ]), and θ_i' = (θ_i/α) M_i. Key contributions include a hyperparameter-free masking design that is invariant to certain parameter shifts, a label-free adaptation strategy via a SwAV-based clustering objective, and a model cascade framework that trains multiple expert masks and fuses their embeddings with PCA to improve downstream accuracy under limited supervision; these approaches yield up to around 79x storage efficiency and competitive performance across eight datasets and three architectures. The work demonstrates strong results in label-efficient and semi-supervised regimes and shows that cascades can provide consistent accuracy gains (e.g., several points in linear probing) while maintaining substantial storage advantages, offering a scalable path for deploying foundation models with minimal labeled data.
Abstract
With the advent of billion-parameter foundation models, efficient fine-tuning has become increasingly important for the adaptation of models to downstream tasks. However, especially in computer vision, it can be hard to achieve good performance when access to quality labeled data is lacking. In this work, we propose a method adapting pretrained generalist models in a self-supervised manner by learning binary masks. These self-supervised masking networks (SMNs) are up to 79x more efficient to store and significantly improve performance on label-efficient downstream tasks. We validate the usefulness of learning binary masks as a fine-tuning method on 8 datasets and 3 model architectures, and we demonstrate the effectiveness of SMNs in 3 label-efficient settings.
