Classifier Enhancement Using Extended Context and Domain Experts for Semantic Segmentation
Huadong Tang, Youpeng Zhao, Min Xu, Jun Wang, Qiang Wu
TL;DR
This work tackles the challenge of fixed, imbalanced pixel-wise classifiers in semantic segmentation by proposing the Extended Context-Aware Classifier (ECAC), which fuses a dynamically updated memory bank storing dataset-level class representations with image-level class centers. A teacher-student distillation framework (Teacher-ECAC and Student-ECAC) guided by ground-truth labels and a calibration stage further mitigates bias toward majority classes, enabling GT-like performance during inference. The approach is lightweight and plug-in, improving state-of-the-art results on ADE20K, COCO-Stuff10K, and Pascal-Context across both CNN- and Transformer-based backbones, with ablations showing the complementary impact of memory, distillation, and calibration. The proposed method significantly enhances minority-class segmentation and overall robustness with minimal computational overhead, providing a practical path to more balanced semantic segmentation in diverse datasets.
Abstract
Prevalent semantic segmentation methods generally adopt a vanilla classifier to categorize each pixel into specific classes. Although such a classifier learns global information from the training data, this information is represented by a set of fixed parameters (weights and biases). However, each image has a different class distribution, which prevents the classifier from addressing the unique characteristics of individual images. At the dataset level, class imbalance leads to segmentation results being biased towards majority classes, limiting the model's effectiveness in identifying and segmenting minority class regions. In this paper, we propose an Extended Context-Aware Classifier (ECAC) that dynamically adjusts the classifier using global (dataset-level) and local (image-level) contextual information. Specifically, we leverage a memory bank to learn dataset-level contextual information of each class, incorporating the class-specific contextual information from the current image to improve the classifier for precise pixel labeling. Additionally, a teacher-student network paradigm is adopted, where the domain expert (teacher network) dynamically adjusts contextual information with ground truth and transfers knowledge to the student network. Comprehensive experiments illustrate that the proposed ECAC can achieve state-of-the-art performance across several datasets, including ADE20K, COCO-Stuff10K, and Pascal-Context.
