LandSegmenter: Towards a Flexible Foundation Model for Land Use and Land Cover Mapping
Chenying Liu, Wei Huang, Xiao Xiang Zhu
TL;DR
LandSegmenter tackles the generalization bottleneck in LULC mapping by introducing a task-specific foundation model trained with a large weakly labeled LAS dataset, enabling flexible multi-modal inputs and adaptive outputs. It combines an RS-adaptive encoder, a GeoRSCLIP-based text prompter, and a vision-text decoder with an AFM and high-frequency/spectral enhancements, along with a confidence-guided fusion strategy to bolster zero-shot performance. Key contributions include constructing LAS (≈150k samples across eight subsets with ~80% weak labels), designing a three-part LandSegmenter architecture, and demonstrating strong zero-shot and fine-tuning results across six diverse LULC datasets. The approach highlights the practical value of weak supervision for scaling task-specific FMs in Earth observation and provides a pathway toward flexible, semantically aware LULC mapping with reduced labeling burden.
Abstract
Land Use and Land Cover (LULC) mapping is a fundamental task in Earth Observation (EO). However, current LULC models are typically developed for a specific modality and a fixed class taxonomy, limiting their generability and broader applicability. Recent advances in foundation models (FMs) offer promising opportunities for building universal models. Yet, task-agnostic FMs often require fine-tuning for downstream applications, whereas task-specific FMs rely on massive amounts of labeled data for training, which is costly and impractical in the remote sensing (RS) domain. To address these challenges, we propose LandSegmenter, an LULC FM framework that resolves three-stage challenges at the input, model, and output levels. From the input side, to alleviate the heavy demand on labeled data for FM training, we introduce LAnd Segment (LAS), a large-scale, multi-modal, multi-source dataset built primarily with globally sampled weak labels from existing LULC products. LAS provides a scalable, cost-effective alternative to manual annotation, enabling large-scale FM training across diverse LULC domains. For model architecture, LandSegmenter integrates an RS-specific adapter for cross-modal feature extraction and a text encoder for semantic awareness enhancement. At the output stage, we introduce a class-wise confidence-guided fusion strategy to mitigate semantic omissions and further improve LandSegmenter's zero-shot performance. We evaluate LandSegmenter on six precisely annotated LULC datasets spanning diverse modalities and class taxonomies. Extensive transfer learning and zero-shot experiments demonstrate that LandSegmenter achieves competitive or superior performance, particularly in zero-shot settings when transferred to unseen datasets. These results highlight the efficacy of our proposed framework and the utility of weak supervision for building task-specific FMs.
