Structural and Statistical Texture Knowledge Distillation for Semantic Segmentation
Deyi Ji, Haoran Wang, Mingyuan Tao, Jianqiang Huang, Xian-Sheng Hua, Hongtao Lu
TL;DR
The paper tackles semantic segmentation by addressing a gap in knowledge distillation: loss of low-level texture cues. It introduces SSTKD, combining a Contourlet Decomposition Module (CDM) for structural texture and a Denoised Texture Intensity Equalization Module (DTIEM) for statistical texture, both distilled from a teacher to a lighter student. The framework adds dedicated losses L_{str} and L_{sta} alongside standard response and adversarial losses within a PSPNet-based teacher–student setup, yielding state-of-the-art results on Cityscapes, Pascal VOC 2012, and ADE20K. This texture-centric KD approach improves boundary detail and intensity distribution while maintaining efficiency, making it practical for high-resolution semantic segmentation tasks.
Abstract
Existing knowledge distillation works for semantic segmentation mainly focus on transferring high-level contextual knowledge from teacher to student. However, low-level texture knowledge is also of vital importance for characterizing the local structural pattern and global statistical property, such as boundary, smoothness, regularity and color contrast, which may not be well addressed by high-level deep features. In this paper, we are intended to take full advantage of both structural and statistical texture knowledge and propose a novel Structural and Statistical Texture Knowledge Distillation (SSTKD) framework for semantic segmentation. Specifically, for structural texture knowledge, we introduce a Contourlet Decomposition Module (CDM) that decomposes low-level features with iterative Laplacian pyramid and directional filter bank to mine the structural texture knowledge. For statistical knowledge, we propose a Denoised Texture Intensity Equalization Module (DTIEM) to adaptively extract and enhance statistical texture knowledge through heuristics iterative quantization and denoised operation. Finally, each knowledge learning is supervised by an individual loss function, forcing the student network to mimic the teacher better from a broader perspective. Experiments show that the proposed method achieves state-of-the-art performance on Cityscapes, Pascal VOC 2012 and ADE20K datasets.
