Distilling Local Texture Features for Colorectal Tissue Classification in Low Data Regimes
Dmitry Demidov, Roba Al Majzoub, Amandeep Kumar, Fahad Khan
TL;DR
This work tackles colorectal tissue classification under severe data scarcity, where labeling is costly and rare classes occur. It proposes KD-CTCNet, a two-branch CNN with a local patch branch guided by a global teacher through self-distillation to enrich local texture representations. The method optimizes a combined loss $\mathcal{L} = \tfrac{1}{2} \mathcal{L}_{main} + \alpha \tfrac{1}{2} \mathcal{L}_{dist}$ with $\alpha=0.1$, samples local patches from $10$–$50\%$ of the image, and uses focal loss for very small per-class samples, achieving improved accuracy over baselines on two CRC datasets. This approach demonstrates that enhancing local texture encoding can substantially improve performance in low-data colorectal histopathology tasks, with publicly available code and models enabling adoption in clinical research.
Abstract
Multi-class colorectal tissue classification is a challenging problem that is typically addressed in a setting, where it is assumed that ample amounts of training data is available. However, manual annotation of fine-grained colorectal tissue samples of multiple classes, especially the rare ones like stromal tumor and anal cancer is laborious and expensive. To address this, we propose a knowledge distillation-based approach, named KD-CTCNet, that effectively captures local texture information from few tissue samples, through a distillation loss, to improve the standard CNN features. The resulting enriched feature representation achieves improved classification performance specifically in low data regimes. Extensive experiments on two public datasets of colorectal tissues reveal the merits of the proposed contributions, with a consistent gain achieved over different approaches across low data settings. The code and models are publicly available on GitHub.
