Table of Contents
Fetching ...

Classifier Enhancement Using Extended Context and Domain Experts for Semantic Segmentation

Huadong Tang, Youpeng Zhao, Min Xu, Jun Wang, Qiang Wu

TL;DR

This work tackles the challenge of fixed, imbalanced pixel-wise classifiers in semantic segmentation by proposing the Extended Context-Aware Classifier (ECAC), which fuses a dynamically updated memory bank storing dataset-level class representations with image-level class centers. A teacher-student distillation framework (Teacher-ECAC and Student-ECAC) guided by ground-truth labels and a calibration stage further mitigates bias toward majority classes, enabling GT-like performance during inference. The approach is lightweight and plug-in, improving state-of-the-art results on ADE20K, COCO-Stuff10K, and Pascal-Context across both CNN- and Transformer-based backbones, with ablations showing the complementary impact of memory, distillation, and calibration. The proposed method significantly enhances minority-class segmentation and overall robustness with minimal computational overhead, providing a practical path to more balanced semantic segmentation in diverse datasets.

Abstract

Prevalent semantic segmentation methods generally adopt a vanilla classifier to categorize each pixel into specific classes. Although such a classifier learns global information from the training data, this information is represented by a set of fixed parameters (weights and biases). However, each image has a different class distribution, which prevents the classifier from addressing the unique characteristics of individual images. At the dataset level, class imbalance leads to segmentation results being biased towards majority classes, limiting the model's effectiveness in identifying and segmenting minority class regions. In this paper, we propose an Extended Context-Aware Classifier (ECAC) that dynamically adjusts the classifier using global (dataset-level) and local (image-level) contextual information. Specifically, we leverage a memory bank to learn dataset-level contextual information of each class, incorporating the class-specific contextual information from the current image to improve the classifier for precise pixel labeling. Additionally, a teacher-student network paradigm is adopted, where the domain expert (teacher network) dynamically adjusts contextual information with ground truth and transfers knowledge to the student network. Comprehensive experiments illustrate that the proposed ECAC can achieve state-of-the-art performance across several datasets, including ADE20K, COCO-Stuff10K, and Pascal-Context.

Classifier Enhancement Using Extended Context and Domain Experts for Semantic Segmentation

TL;DR

This work tackles the challenge of fixed, imbalanced pixel-wise classifiers in semantic segmentation by proposing the Extended Context-Aware Classifier (ECAC), which fuses a dynamically updated memory bank storing dataset-level class representations with image-level class centers. A teacher-student distillation framework (Teacher-ECAC and Student-ECAC) guided by ground-truth labels and a calibration stage further mitigates bias toward majority classes, enabling GT-like performance during inference. The approach is lightweight and plug-in, improving state-of-the-art results on ADE20K, COCO-Stuff10K, and Pascal-Context across both CNN- and Transformer-based backbones, with ablations showing the complementary impact of memory, distillation, and calibration. The proposed method significantly enhances minority-class segmentation and overall robustness with minimal computational overhead, providing a practical path to more balanced semantic segmentation in diverse datasets.

Abstract

Prevalent semantic segmentation methods generally adopt a vanilla classifier to categorize each pixel into specific classes. Although such a classifier learns global information from the training data, this information is represented by a set of fixed parameters (weights and biases). However, each image has a different class distribution, which prevents the classifier from addressing the unique characteristics of individual images. At the dataset level, class imbalance leads to segmentation results being biased towards majority classes, limiting the model's effectiveness in identifying and segmenting minority class regions. In this paper, we propose an Extended Context-Aware Classifier (ECAC) that dynamically adjusts the classifier using global (dataset-level) and local (image-level) contextual information. Specifically, we leverage a memory bank to learn dataset-level contextual information of each class, incorporating the class-specific contextual information from the current image to improve the classifier for precise pixel labeling. Additionally, a teacher-student network paradigm is adopted, where the domain expert (teacher network) dynamically adjusts contextual information with ground truth and transfers knowledge to the student network. Comprehensive experiments illustrate that the proposed ECAC can achieve state-of-the-art performance across several datasets, including ADE20K, COCO-Stuff10K, and Pascal-Context.

Paper Structure

This paper contains 31 sections, 21 equations, 19 figures, 8 tables.

Figures (19)

  • Figure 1: Analysis of inference complexity and accuracy for ADE20K. The arrows in the figure represent the improvement achieved by our ECAC method. The lower ends of the arrows represent the performance of the original methods, while the upper ends show the improved performance after incorporating ECAC. Our ECAC significantly improves the segmentation performance while bringing a little computational complexity.
  • Figure 2: The overview of the proposed ECAC. The memory bank $\mathcal{M}$ stores the dataset-level category information. Combined with the within-image class center, we obtain an extended context-aware classifier. A calibration is adopted to mitigate the imbalanced issue. A teacher-student network is adopted to transfer comprehensive contextual information extracted by the ground-truth label to further enhance the classifier. ($\bullet$$\bullet$$\bullet$) represents the mean features of different classes.
  • Figure 3: The process of memory bank updating. The memory bank $\mathcal{M}$ is initialized as an empty structure of size $n\times d$. We update the memory bank using a momentum-based approach (Eq.\ref{['eq:memory_update']}), where $i-th$ class representation $\mathcal{M}_{i}$ is refined by blending the previous memory bank state with the newly computed class-specific features $Z_i$.
  • Figure 7: Visualization of features distribution learned with DeeplabV3plus deeplabv3 (left) and our ECAC (right).
  • Figure : (a) Image
  • ...and 14 more figures