DailyMAE: Towards Pretraining Masked Autoencoders in One Day
Jiantao Wu, Shentong Mo, Sara Atito, Zhenhua Feng, Josef Kittler, Muhammad Awais
TL;DR
This paper tackles the high computational cost of pretraining masked image modeling (MIM) for self-supervised learning by introducing efficient recipes that mitigate data-loading bottlenecks and apply progressive training. It presents an enhanced FFCV-based data pipeline (ESSL) and progressive resolution strategies to accelerate MAE pretraining, achieving MAE-Base/16 on ImageNet-1K in 18–17 hours on a single machine with multiple GPUs and up to 5.8× speedups. The authors also propose a comprehensive finetuning recipe with Three Augmentations and standardized validation, and they systematically study compression, data shifts, and dynamic resizing during both finetuning and pretraining. The resulting framework lowers the barrier to SSL research and rapid prototyping, while highlighting trade-offs in data compression and resolution that influence accuracy and efficiency. Overall, the work provides a practical toolkit for fast, iterative SSL experimentation on limited hardware and fosters broader accessibility for MAE-style pretraining research.
Abstract
Recently, masked image modeling (MIM), an important self-supervised learning (SSL) method, has drawn attention for its effectiveness in learning data representation from unlabeled data. Numerous studies underscore the advantages of MIM, highlighting how models pretrained on extensive datasets can enhance the performance of downstream tasks. However, the high computational demands of pretraining pose significant challenges, particularly within academic environments, thereby impeding the SSL research progress. In this study, we propose efficient training recipes for MIM based SSL that focuses on mitigating data loading bottlenecks and employing progressive training techniques and other tricks to closely maintain pretraining performance. Our library enables the training of a MAE-Base/16 model on the ImageNet 1K dataset for 800 epochs within just 18 hours, using a single machine equipped with 8 A100 GPUs. By achieving speed gains of up to 5.8 times, this work not only demonstrates the feasibility of conducting high-efficiency SSL training but also paves the way for broader accessibility and promotes advancement in SSL research particularly for prototyping and initial testing of SSL ideas. The code is available in https://github.com/erow/FastSSL.
