EoS-FM: Can an Ensemble of Specialist Models act as a Generalist Feature Extractor?
Pierre Adorni, Minh-Tan Pham, Stéphane May, Sébastien Lefèvre
TL;DR
The paper tackles the resource-heavy paradigm of Earth Observation foundation models by proposing EoS-FM, an Ensemble-of-Specialists framework that aggregates multiple lightweight, task-specific ConvNeXtV2-Atto encoders. Encoders are kept frozen during downstream tasks, with a differentiable selection layer and a 1x1 fusion to produce compact representations, enabling strong performance across 11 RS tasks with significantly fewer parameters. The method demonstrates robust performance under label scarcity, scales through pruning via a top-k mechanism to produce compact variants, and naturally supports federated training. The work emphasizes modularity, efficiency, and sustainability, offering a practical path toward general-purpose RSFMs with broad applicability and open-source availability.
Abstract
Recent advances in foundation models have shown great promise in domains such as natural language processing and computer vision, and similar efforts are now emerging in the Earth Observation community. These models aim to generalize across tasks with limited supervision, reducing the need for training separate models for each task. However, current strategies, which largely focus on scaling model size and dataset volume, require prohibitive computational and data resources, limiting accessibility to only a few large institutions. Moreover, this paradigm of ever-larger models stands in stark contrast with the principles of sustainable and environmentally responsible AI, as it leads to immense carbon footprints and resource inefficiency. In this work, we present a novel and efficient alternative: an Ensemble-of-Specialists framework for building Remote Sensing Foundation Models (RSFMs). Our method decomposes the training process into lightweight, task-specific ConvNeXtV2 specialists that can be frozen and reused. This modular approach offers strong advantages in efficiency, interpretability, and extensibility. Moreover, it naturally supports federated training, pruning, and continuous specialist integration, making it particularly well-suited for collaborative and resource-constrained settings. Our framework sets a new direction for building scalable and efficient RSFMs. All codes and pretrained models are available at https://github.com/pierreadorni/EoS-FM.
