Table of Contents
Fetching ...

Distilling Local Texture Features for Colorectal Tissue Classification in Low Data Regimes

Dmitry Demidov, Roba Al Majzoub, Amandeep Kumar, Fahad Khan

TL;DR

This work tackles colorectal tissue classification under severe data scarcity, where labeling is costly and rare classes occur. It proposes KD-CTCNet, a two-branch CNN with a local patch branch guided by a global teacher through self-distillation to enrich local texture representations. The method optimizes a combined loss $\mathcal{L} = \tfrac{1}{2} \mathcal{L}_{main} + \alpha \tfrac{1}{2} \mathcal{L}_{dist}$ with $\alpha=0.1$, samples local patches from $10$–$50\%$ of the image, and uses focal loss for very small per-class samples, achieving improved accuracy over baselines on two CRC datasets. This approach demonstrates that enhancing local texture encoding can substantially improve performance in low-data colorectal histopathology tasks, with publicly available code and models enabling adoption in clinical research.

Abstract

Multi-class colorectal tissue classification is a challenging problem that is typically addressed in a setting, where it is assumed that ample amounts of training data is available. However, manual annotation of fine-grained colorectal tissue samples of multiple classes, especially the rare ones like stromal tumor and anal cancer is laborious and expensive. To address this, we propose a knowledge distillation-based approach, named KD-CTCNet, that effectively captures local texture information from few tissue samples, through a distillation loss, to improve the standard CNN features. The resulting enriched feature representation achieves improved classification performance specifically in low data regimes. Extensive experiments on two public datasets of colorectal tissues reveal the merits of the proposed contributions, with a consistent gain achieved over different approaches across low data settings. The code and models are publicly available on GitHub.

Distilling Local Texture Features for Colorectal Tissue Classification in Low Data Regimes

TL;DR

This work tackles colorectal tissue classification under severe data scarcity, where labeling is costly and rare classes occur. It proposes KD-CTCNet, a two-branch CNN with a local patch branch guided by a global teacher through self-distillation to enrich local texture representations. The method optimizes a combined loss with , samples local patches from of the image, and uses focal loss for very small per-class samples, achieving improved accuracy over baselines on two CRC datasets. This approach demonstrates that enhancing local texture encoding can substantially improve performance in low-data colorectal histopathology tasks, with publicly available code and models enabling adoption in clinical research.

Abstract

Multi-class colorectal tissue classification is a challenging problem that is typically addressed in a setting, where it is assumed that ample amounts of training data is available. However, manual annotation of fine-grained colorectal tissue samples of multiple classes, especially the rare ones like stromal tumor and anal cancer is laborious and expensive. To address this, we propose a knowledge distillation-based approach, named KD-CTCNet, that effectively captures local texture information from few tissue samples, through a distillation loss, to improve the standard CNN features. The resulting enriched feature representation achieves improved classification performance specifically in low data regimes. Extensive experiments on two public datasets of colorectal tissues reveal the merits of the proposed contributions, with a consistent gain achieved over different approaches across low data settings. The code and models are publicly available on GitHub.
Paper Structure (8 sections, 2 equations, 3 figures, 3 tables)

This paper contains 8 sections, 2 equations, 3 figures, 3 tables.

Figures (3)

  • Figure 1: Samples from Kather-2016 (top) and Kather-2019 (bottom) datasets.
  • Figure 2: Overview of our proposed knowledge distillation-based framework, named KD-CTCNet, for CRC tissue classification in low data regimes. The framework strives to obtain enriched feature representations by explicitly capturing the inherent local texture patterns within the CRC data. The proposed KD-CTCNet comprises a conventional CNN stream (top branch) encoding features from the full image content with a standard cross-entropy loss ($\cal L$$_{main}$) and a parallel branch that is specifically designed to capture local texture feature representations by performing local image sampling. Both branches share the weights and their corresponding output logits are compared using a self-distillation loss ($\cal L$$_{dist}$). The resulting enriched feature representations are beneficial to obtain improved classification performance, especially in low data regimes.
  • Figure 3: Comparison of confusion matrices calculated on the test set with 20 % of the available data for (a) vanilla ResNet-50 and (b) our KD-CTCNet approach.