Center-guided Classifier for Semantic Segmentation of Remote Sensing Images
Wei Zhang, Mengting Ma, Yizhen Jiang, Rongrong Lian, Zhenkai Wu, Kangning Cui, Xiaowen Ma
TL;DR
CenterSeg addresses the problem of large intraclass variance in remote sensing image segmentation by replacing the standard parametric softmax with a center-guided classifier that uses multiple per-class prototypes derived from local class centers. Prototypes are generated via ground-truth-guided feature aggregation and hard-attention with momentum updates, and are regularized through two Grassmann-manifold terms: prototype-to-prototype orthogonality and feats-to-prototype alignment, improving both intra-class compactness and inter-class separability. The approach is plug-and-play, lightweight, and interpretable, achieving state-of-the-art or competitive results on Vaihingen, Potsdam, and LoveDA datasets while maintaining compatibility with existing RSI segmentation backbones. This work offers practical impact by providing a transparent prototype-based decision mechanism with minimal extra storage, enabling robust performance in high-intraclass-variance RSI tasks.
Abstract
Compared with natural images, remote sensing images (RSIs) have the unique characteristic. i.e., larger intraclass variance, which makes semantic segmentation for remote sensing images more challenging. Moreover, existing semantic segmentation models for remote sensing images usually employ a vanilla softmax classifier, which has three drawbacks: (1) non-direct supervision for the pixel representations during training; (2) inadequate modeling ability of parametric softmax classifiers under large intraclass variance; and (3) opaque process of classification decision. In this paper, we propose a novel classifier (called CenterSeg) customized for RSI semantic segmentation, which solves the abovementioned problems with multiple prototypes, direct supervision under Grassmann manifold, and interpretability strategy. Specifically, for each class, our CenterSeg obtains local class centers by aggregating corresponding pixel features based on ground-truth masks, and generates multiple prototypes through hard attention assignment and momentum updating. In addition, we introduce the Grassmann manifold and constrain the joint embedding space of pixel features and prototypes based on two additional regularization terms. Especially, during the inference, CenterSeg can further provide interpretability to the model by restricting the prototype as a sample of the training set. Experimental results on three remote sensing segmentation datasets validate the effectiveness of the model. Besides the superior performance, CenterSeg has the advantages of simplicity, lightweight, compatibility, and interpretability. Code is available at https://github.com/xwmaxwma/rssegmentation.
